AI Design Battle: Gemini 3 vs. Claude Opus 4.5 vs. Codex 5.1—Which Model Is the Best Designer?

By: Claire Vo Source: How I AI (ChatPRD / Lenny’s Podcast Network) Type: podcast

Summary

Controlled one-shot redesign experiment using the ChatPRD blog as test subject, same prompt across all three models in Cursor. Gemini 3 Pro: Fast execution, produced usable hero + three-column card layout with hover effects, strong SEO (JSON-LD schema, breadcrumbs, semantic HTML, metadata). Weaknesses: cramped top element, no fallback for missing images. Opus 4.5 (winner): Created a detailed 4-step plan before coding. Scanned repo for existing design assets and incorporated on-brand elements. Produced polished UI with hover CTAs, graceful image fallbacks (book icon), reading time estimates, and newsletter redesign. Context-aware and production-ready. Codex 5.1: Generic AI purple gradient, white-background logo on colored gradient, broken image links, non-functional category links, blog list showed no posts. SEO metadata added but UX was unusable.

Key Ideas Extracted

Planning before coding differentiates quality: Opus 4.5’s to-do list approach led to measurably better output than models that jumped straight to code — process matters more than raw capability
Context-awareness as competitive advantage: Opus scanned the existing repo for design assets and brand elements; Gemini and Codex treated the task generically
Model switching by task type: Different models have different strengths — Opus for front-end design, Gemini for fast general tasks, Codex for backend work. Building a mental model of each model’s strengths is an essential skill.
Graceful degradation as quality signal: How a model handles edge cases (missing images, broken data) reveals design maturity — Opus created fallback icons while Gemini left gaps
Generic “AI purple” as quality indicator: If a model defaults to purple gradients and generic styling, it likely isn’t reading your existing design context
One-shot redesign as model evaluation technique: Same prompt + same codebase + different models = apples-to-apples comparison you can run in under 20 minutes
Result was actually shipped: Claire liked the Opus 4.5 output enough to deploy it to the live ChatPRD blog

Notes

Published Dec 3, 2025 on ChatPRD blog. 8-min read. Solo mini-episode.
All testing done in Cursor for consistent environment
The universal prompt was intentionally non-prescriptive — phrased like a normal request to a colleague
Total time for all three designs: under 20 minutes (vs. days/weeks in traditional workflow)
Sponsor: Lovable

Raw Content

Re-scraped from ChatPRD 2026-02-15. Full article content captured in Summary and Key Ideas above.