Step-by-step:
Select five use cases (we used a logo, website graphic, Instagram post, marketing brochure, and photorealistic image) and describe the testing guidelines: Each model received the identical question, four photos, and a score ranging from 1 to 5 for consistency, inventiveness, usefulness, and quality.
Use cases should be fed into Claude, ChatGPT, or Gemini with the command, "Here are my use cases: [X]." For each, create a JSON prompt with four versions in a 4x4 grid.
Make a rating matrix with the formula Overall Rating = (Consistency + Creativity + Utility + Quality)/4 (copy our Notion tutorial here).
Create images by entering each question into both programs using fresh conversations for each use case, then evaluate the results according to your standards.
Use cases should be fed into Claude, ChatGPT, or Gemini with the command, "Here are my use cases: [X]." For each, create a JSON prompt with four versions in a 4x4 grid.
Make a rating matrix with the formula Overall Rating = (Consistency + Creativity + Utility + Quality)/4 (copy our Notion tutorial here).
Create images by entering each question into both programs using fresh conversations for each use case, then evaluate the results according to your standards.
Tell the LLM to create a prompt that will produce four variations in a 4x4 grid in order to save time and tokens.