Improving Claude's creative-tool surface: I led discovery, prioritization, and the PRD for
image-generation integration.
$ read overview01 / 06
Executive Summary
A product case study on improving Claude's creative-tool surface. I led discovery,
prioritization, and the PRD for image-generation integration — taking the work from
user-research signals to a shippable plan.
Anthropic PBC is an American AI company founded in 2021 by former OpenAI employees, including
siblings Daniela and Dario Amodei. Claude is its flagship assistant — a family of LLMs
designed to be helpful, honest, and harmless, used today by businesses, developers, educators,
marketers, and individual users across free and paid tiers.
Available models
Claude Opus
The highest-performing model for complex analysis and advanced tasks
Exceptional reasoning capabilities
Highest accuracy for complex tasks
Advanced problem-solving
Nuanced understanding of context
Claude Sonnet
Balances capability and performance for efficient, high-throughput tasks
Excellent balance of speed and capability
Ideal for most business applications
Cost-effective for daily use
Strong multilingual support
Claude Haiku
Optimized for speed and lightweight actions
Ultra-fast response times
Efficient for simple tasks
Low computational requirements
Ideal for mobile applications
Claude Sonnet 3.7
Latest model featuring hybrid reasoning capabilities (released February 2025)
Hybrid reasoning architecture
Enhanced problem-solving
Improved contextual understanding
Advanced tool usage capabilities
Core functionalities
Vision capabilities
Complex task management
In-depth language understanding
Creative writing
Seamless interaction
Multi-step code generation
Document analysis
High file-upload limit
Milestones
2021
Founded by seven former OpenAI employees
2022
Received $580M in funding, including $500M from FTX
2023
Officially introduced Claude to the public
Secured a $4B partnership with Amazon
Received a $2B commitment from Google
2024
Released Claude 3 with three models: Opus, Sonnet, Haiku
Launched Claude Team plan and iOS app
Released Claude 3.5 Sonnet with improved performance
Added "Computer use" feature to Claude
Partnered with Palantir and AWS for U.S. intelligence agencies
Made Claude 3.5 Haiku available to all users
2025
Introduced Claude 3.7 Sonnet with hybrid reasoning capabilities
$ read discovery02 / 06
Problem Discovery
I set out to identify Claude's most important pain points and the opportunities to better serve
them. This research formed the foundation for the prioritized problem statements that follow.
Reddit data analysis
Sampled 150 Reddit posts mentioning Claude (r/ChatGPT, r/Claude, r/ArtificialIntelligence)
plus 120 complaint and feedback threads.
Top use cases
Writing Assistant42
Research35
Code Helper28
Learning22
Creative Writing18
Top pain points
Hallucinations38
Cost30
Limited Knowledge25
Slowness20
Privacy15
Key insights
Writing assistant is the most common use case — drafting, editing, and refining content.
Research assistance is highly valued for synthesizing information across multiple sources.
Hallucinations remain the top concern, particularly for factual or technical content.
Users compare Claude to ChatGPT favorably for conversational depth, less so for technical knowledge.
Many users switched to Claude specifically for its larger context window and more nuanced responses.
User interviews
Jordan Garcia
24 · Fresno, California
Bio
Senior CIS major at Fresno State. Tech-savvy student using AI assistants daily for academic work and personal projects. Deep interest in machine learning; relies on AI to understand complex concepts and complete coding assignments.
Quote
"Claude is better at helping me with the machine learning stuff than ChatGPT. The way it explains things makes more sense to me."
Core needs
Help understanding complex algorithms and coding concepts
Assistance with academic writing and research
Summarization of technical material
Scheduling and organization support
Code solutions for ML projects
Frustrations
Python version conflicts and dependency management
Switching between AI tools for different capabilities
Context loss when asking for summarization or paraphrasing
Manual rework to adapt suggestions to specific needs
No visual output for certain projects
Ideal solution
An assistant that combines Claude's strengths in explaining ML concepts with image generation, better contextual summarization, and a UX that makes it the go-to for all tasks rather than tool-switching.
$ read prioritization03 / 06
Problem Prioritization
I used a weighted scoring model to rank candidate problems by user impact, technical
feasibility, and business value, surfacing the highest-leverage work to address first.
Prioritized problem statements
Problem 1Highest priority
Response Complexity Problem
How might we provide users with appropriately detailed responses that match their specific needs without requiring additional prompting?
Impact: Improve core UX, compete with Grok
Metrics: Reduced follow-up prompts
Problem 2High priority
Python Dependency Management
How might we enhance Claude's assistance to account for Python environment constraints?
Impact: Strengthen position as coding assistant
Metrics: Increased usage for Python projects
Problem 3Medium priority
Creative Capabilities Gap
How might we expand Claude's capabilities to include image generation and editing?
Impact: Meet user needs, open new use cases
Metrics: Feature adoption, reduced switching
Problem 4Medium priority
Research Depth Limitations
How might we enhance Claude's research capabilities across multiple sources?
Impact: Position as complete research assistant
Metrics: Increased research-related prompts
Prioritization framework
Weighted scoring model
I scored each problem on a 1–5 scale across five weighted criteria, then summed weighted
scores to determine final priority.
User Impact — weight 2.0
Reach — weight 1.5
Business Value — weight 1.8
Competitive Differentiation — weight 1.2
Technical Feasibility — weight 1.0
Results
Creative Capabilities Gap33.0 pts
Voice Input Feature29.0 pts
Research Depth Limitations27.3 pts
Response Complexity25.0 pts
Python Dependency Management17.8 pts
Key insights & recommendations
#1
Creative Capabilities (33.0 pts)
Market growth: 17.4% CAGR through 2030
Competitive gap: Major competitors offer this
New revenue: Opens up new use cases
#2
Voice Input (29.0 pts)
Accessibility: Expands to voice users
Industry trend: Toward multimodal interfaces
Effort: Moderate; uses existing tech
#3
Research Depth (27.3 pts)
In progress: "Compass" feature in testing
Lower priority: Competitors developing similar
#4–5
Other Priorities
Response Complexity: Addressed by extended thinking mode (25.0 pts)
Creative Capabilities Gap (33.0 pts) was the highest-priority problem. I generated diverse
solution paths, then narrowed down to a third-party API integration — and selected Midjourney as
the provider after a structured comparison.
Brainstorming diverse solutions
Third-party API integration
High impact
Idea: Leverage existing models via APIs
Feasibility: High; many robust APIs exist
Impact: Rapidly enhances platform capabilities
In-house development
Moderate impact
Idea: Develop proprietary image generation
Feasibility: Low; requires significant resources
Impact: Long-term strategic differentiation
Hybrid model with refinement
High impact
Idea: Combine generation with refinement tools
Feasibility: Moderate; requires integration work
Impact: Boosts satisfaction with personalization
Creative platform integration
High impact
Idea: Partner with platforms like Adobe
Feasibility: Depends on partnership agreements
Impact: Leverages tools users already trust
Provider selection
After evaluating the solution paths, third-party API integration offered the best balance of
impact, feasibility, and time-to-market. I compared the three leading providers:
Provider
Image quality
UI components
Integration
Score
MidjourneySelected
9/10 (27)
10/10 (30)
8/10 (16)
112
DALL-E
8/10 (24)
6/10 (18)
9/10 (18)
98
Stable Diffusion
8/10 (24)
7/10 (21)
8/10 (16)
100
Why Midjourney
Superior UI & customization: Robust components that appeal to Claude users
High image quality: Artistic outputs meeting creative standards
Competitive integration: Strong developer support and documentation
Cost & scalability: Proven pricing models and reliable performance
Implementation plan
1
Phase 1 — Initial integration
Connect Claude API with Midjourney via custom wrapper
Implement basic prompt-to-image conversion
Timeline: 4–6 weeks for MVP
2
Phase 2 — Enhanced features
Add image editing and refinement capabilities
Implement context-aware image suggestions
Timeline: 2–3 months after initial release
3
Success metrics
User adoption rate: >40% in first 3 months
Satisfaction score: >4.2/5 for image generation
Reduction in platform switching: 30%+
4
Expected outcomes
Increased user satisfaction and retention
New revenue opportunities through premium tiers
Competitive advantage over single-modal AI assistants
$ read design-prototypes05 / 06
Design Implementation
The proposed design implements a seamless image-generation workflow inside Claude. Four stages
take a request from natural-language input to assets integrated into the user's working files.
Stage 1: Entry point
The initial interface maintains Claude's minimalist aesthetic with a clean, focused design.
Key features
Familiar environment with a clearly defined input area
Simple prompt bar for natural-language interaction
No specialized commands required to initiate image generation
Design philosophy
The entry point keeps Claude's minimalist aesthetic while subtly introducing image generation. The interface prioritizes familiarity for existing users while making the new capability discoverable without overwhelming the chat experience.
Key design considerations
Accessibility first: The conversational interface makes advanced image generation accessible to non-technical users.
Contextual continuity: The design maintains connection between generated images and their intended purpose throughout the workflow.
Progressive disclosure: Complex options surface only when relevant, preventing cognitive overload.
Visual feedback: Clear presentation of results with multiple options encourages experimentation and refinement.
Seamless integration: Generated assets become immediately available for use in other creative contexts.
$ read prd06 / 06
Product Requirements Document
The PRD that synthesizes the work above into a shippable specification: integrating Midjourney's
image-generation API into Claude.
TL;DR
This project integrates Midjourney's image-generation API into the Claude platform, enabling users to create and manage AI-generated images directly within conversations. It addresses a key user need for creative visual capabilities, drives richer collaboration for content creators, developers, and businesses, and lands streamlined UX, high-quality outputs, and seamless workflow integration as the big wins.
Business goals
Increase user engagement on Claude by 25% within six months of launch.
Reduce platform-switching to other AI tools by 30%.
Increase paid-plan conversions by 15%.
Strengthen Claude's competitive position against other AI assistants.
Enable new monetization opportunities around premium image features.
User goals
Create high-quality images directly within Claude conversations.
Easily refine and iterate on generated images.
Seamlessly integrate generated images into their workflows.
Experience consistent image quality across devices and platforms.
Share and collaborate around visual content.
Non-goals
Building an in-house image generation model from scratch.
Competing with dedicated graphic-design tools.
Creating video generation capabilities at this stage.
Complex image editing or manipulation tools.
Integration with stock photography libraries.
User stories
Content Creator
I want to generate images based on my descriptions, so I can visualize ideas without switching platforms.
I want to refine generated images through conversational feedback, so I can iteratively improve outputs.
I want to save and organize my generated images, so I can access them across projects.
Developer
I want to generate UI mockups and concept visuals, so I can prototype ideas quickly.
I want to incorporate generated images into my codebase, so I can streamline development.
I want consistent image outputs that match my specifications, so I can rely on them for professional projects.
Business User
As a marketing manager, I want to create on-brand imagery, so I can maintain consistent visual communications.
I want to generate multiple image variations quickly, so I can pick the best options for presentations.
I want to control who can generate images on my team, so I can manage resource usage.
Functional requirements
Image generation core
Priority: High
Generate images based on natural-language prompts.
Provide multiple style options (photorealistic, artistic, concept art, etc.).
Support various aspect ratios (square, portrait, landscape).
Enable image refinement through follow-up prompts.
Support batch generation of multiple images.
Integration & user experience
Priority: High
Seamless access via icon in the Claude chat interface.
Preview generated images before finalizing.
Clear indication of image generation in progress.
Natural-language control of image parameters.
Mobile-responsive image viewer.
Image management
Priority: Medium
Save generated images to user gallery.
Export images in multiple formats (PNG, JPG).
Organize images by conversation or project.
Share images via link or download.
Delete or archive unwanted images.
User experience flow
Step 1 — Initiate image creation
User clicks the camera icon or types a natural-language request.
Modal appears with text field for image description.
Style options are presented with visual examples.
Size/ratio selector is available; defaults to square.
Step 2 — Refine request
User enters detailed description or selects from suggestions.
AI offers clarifying questions if the prompt is vague.
Preview of similar-style images appears when available.
User submits with clear feedback on processing time.
Step 3 — Review results
Four image variations appear in a grid.
User can hover to enlarge each option.
Options to regenerate, refine, or select are clearly presented.
Selected images appear directly in the conversation.
Step 4 — Iterate or finalize
User can request adjustments through conversation.
Changes are applied incrementally with version tracking.
Final images can be saved to gallery or exported.
Unobtrusive feedback prompt appears after completion.
Narrative
Jordan, a CS student at Fresno State, is working on a machine-learning project and needs conceptual diagrams to explain complex algorithms. Previously he had to switch between Claude for explanations and another tool for visuals. With the new image-generation feature, Jordan asks Claude to "create a diagram showing how convolutional neural networks process image data."
Within seconds, Claude presents four visual options. Jordan selects one but asks Claude to "make the layers more distinct and add labels." Claude refines the image based on this feedback and incorporates it directly into their conversation about neural networks. He saves the image for his presentation and never had to break flow.
When explaining the concept to classmates, Jordan shares both Claude's text and the visuals together — a more comprehensive learning experience. The time saved and the output quality strengthen his preference for Claude over competitors and lead him to upgrade to a paid plan.
Success metrics
Metric
Objective
Method
Adoption rate
50% of active users try the feature within 3 months
Feature usage tracking
Retention impact
15% increase in retention for users who use image features
Cohort analysis
Conversion rate
15% increase in free-to-paid conversions
Plan upgrade tracking
Image generation success
98% successful completions
Error-rate monitoring
User satisfaction
CSAT score > 4.5 / 5 for image generation
Post-usage surveys
Project timeline
Medium-large: 8–10 weeks end-to-end, including testing and staged rollout.