What is Nano Banana Pro?

Google's production-grade image generation model (Gemini 3 Pro Image), marketed as Nano Banana Pro. Released in late 2025, it's the first image model we've found strong enough to produce brand-usable output with consistent style and legible typography, not just stylish one-offs.

What makes it good for nano banana pro brand imagery specifically?

Three things. It holds a brand's visual language across a whole image set instead of drifting per prompt. It renders legible in-image text correctly most of the time, which previous models could not. And it responds well to reference images, so you can anchor a session to a single reference and generate variants that actually match.

Does it replace photography?

For the kinds of shots we used to buy from stock libraries or lightly-staged product photography — yes, mostly. For shots that require a specific real person, location, or product detail that must be accurate, no. The category of photography it replaces is the 'illustrative' middle layer that most brands overspend on.

How much does running it at scale cost?

Roughly $0.18–0.35 per final image at production quality, factoring in failed generations. Across four brands we spent about $340/month on image generation in 2025 Q4. That's replacing roughly $4,000/month of stock photography and illustration spend.

Can the output pass a design review?

Usually, after the QA pass. Raw output hits brand-spec about 60% of the time. After the edit-and-reject loop described below, it's closer to 95%. The QA layer is not optional — it's where the difference between 'AI image' and 'brand image' lives.

← back to indexblog / design / nano-banana-pro-for-brand-imagery-the-complete-workflow

● Design

Nano Banana Pro for brand imagery: the complete workflow

Six months using nano banana pro brand imagery across four brands. The model's quirks, the prompts that work, the QA loop that keeps visuals on-brand.

Arthur HofFounder, Bunny Honey Club AI

publishedJan 28, 2026

read9 min

After six months running image generation for four brands — a children's accessory store, a B2B agency, a trading-education SaaS, and a studio blog — the shape of the work has settled. The question has shifted from "is the output good enough?" to "what's the workflow that makes the output reliable?" The nano banana pro brand imagery workflow we've converged on is a three-stage pipeline: a brief specification stage where the human operator defines the brand constraints, a generation stage with careful prompt anchoring to a reference image, and a QA stage where roughly 40% of initial outputs are rejected and regenerated with sharper guidance — the raw model produces brand-usable images about 60% of the time, and the pipeline lifts that to ~95% with about ten minutes of operator time per final image. This is the workflow, the prompts, the common failure modes, and the economics.

Why this model, why now

Image generation crossed a threshold in late 2025 that earlier generations didn't clear. The specific properties that matter for brand work:

Typographic legibility. Previous models could produce aesthetically-pleasing images, but in-image text was a pile of glyphs that looked like letters from a distance. Nano Banana Pro renders real, readable text about 80% of the time — enough that a brand headline can actually appear inside a generated visual.

Style consistency across an image set. Generate ten images for a single campaign and the previous generation's output would have ten subtly different visual languages. Nano Banana Pro holds a style across a set, especially when prompted against a reference image.

Reference-anchored generation. Upload a single brand reference image, and subsequent generations match its color palette, lighting, composition feel. This alone is what makes it workable for brand applications, versus one-off stylish images.

Reliable product rendering. Physical products (bottles, boxes, accessories) render at a quality where the generated image is usable — not photorealistic, but believable enough for lifestyle contexts.

None of these capabilities is perfect. All of them cross the threshold where the output is salvageable with reasonable human editing, where earlier models were not.

60%raw output brand-usable

95%after QA loop

~10 minoperator time per final image

$0.18–0.35fully-loaded cost per final image

The three-stage pipeline

Every image we publish goes through the same three stages.

Stage 1: Brief. The operator writes a short brief — 4–8 lines — specifying the subject, the context, the composition, the tone, and the brand constraints. This is a compressed design spec, not a prompt.

Stage 2: Generate. The operator translates the brief into a prompt, anchors against a reference image from the brand's visual library, and generates 4–6 variants. Selects 1–2 to keep. Regenerates if none of the 4–6 are usable.

Stage 3: QA. The selected image goes through a checklist review: color fidelity, typography (if any text is in the image), composition, brand feel. If anything fails, the image is regenerated with more targeted guidance. The QA step catches maybe 3–4 issues per 10 images that would otherwise slip through.

Each stage is short. Together they average 10 minutes per final image. The first minute is the brief; the next 5–6 are generation and selection; the last 3 are QA.

The brief: what goes in, what stays out

A good brief for image generation is shorter than people expect. It contains:

Subject. What's in the image.
Context. Where it is, what's happening.
Composition. Close-up, medium, wide; centered or off-center; where the subject sits in frame.
Tone. Warm/cool/neutral; bright/moody; energetic/calm.
Brand constraints. Color palette, typographic treatment if applicable, any must-haves or must-nots for this brand.

What stays out: the actual prompt. The brief is a specification; the prompt is the operator's translation of the spec into the model's preferred input format. Writing the brief in prompt syntax prematurely locks out variants you'd want to try.

Example brief for a WondraKids blog post header:

Subject: A child's hand holding a colored pencil over a sketch pad.
Context: Cozy afternoon, soft natural light from a window.
Composition: Close-up, hand in lower-third, pad fills upper two-thirds.
Tone: Warm, desaturated, slightly nostalgic.
Brand: WondraKids — mint accent (#7fbfab), muted pastels, no saturated
primaries. Must read as calm, not cluttered.

Five lines. Every subsequent generation for that post will reference this brief. If we want variants, we vary the brief, not the prompt directly.

The prompt: anchoring, specifics, constraints

The prompt that translates the above brief:

Close-up of a child's hand holding a wooden colored pencil, drawing on a
lightly-textured sketchpad. Hand in lower-third of frame. Sketchpad fills
upper two-thirds, partially blurred. Soft natural warm light, golden-hour
quality, indirect. Color palette dominated by muted mint green, soft
cream, pale warm grays. Highly desaturated, slightly nostalgic, film-
grain-adjacent. Not cluttered — minimal background.

Four sentences. The prompt tells the model:

What's in the frame (the hand, the pencil, the pad)
Where things are placed (composition)
How the light looks
What colors are allowed
What the image should feel like

What's not in the prompt: brand name, words like "brand imagery," instruction about the model's style. The model works better when you describe the image, not the intent.

We also attach a reference image to anchor style. The reference is a previously-approved WondraKids post header. The model reads the reference for color palette, lighting mood, and general feel. The prompt describes the scene; the reference describes the visual language.

The reference library

Every brand we work with has a reference library of 20–40 approved images that serve as generation anchors. These are usually:

Previous published images (for existing brands with a design history)
Carefully selected "north star" images sourced from references (for newer brands)
A small set of moodboard images representing the brand's emotional range

For each new generation session, the operator picks the 1–2 reference images closest to the brief's requirements and uses them as anchors. This is the single most impactful variable in our workflow. Without reference anchoring, the model drifts across sessions. With it, the model's outputs cohere into a single visual language across 50+ generations.

Maintaining the library is real work. We add 3–5 new references per brand per quarter and retire 1–2. The library is not a static asset; it evolves with the brand.

The QA checklist

Raw outputs pass a five-point checklist before they're approved for use.

Color fidelity. Does the palette match the brand's tokens? We eyedrop the generated image and compare against the brand's accent color spec. Tolerance of ±8% on hue, ±5% on saturation. If the generated image's accent is off, we either regenerate with more specific color guidance or we correct in post (which we try to avoid).

Typography (if any). If the image contains text, is every letter correctly rendered? Is the font treatment consistent with the brand? Generated text is right about 80% of the time; the other 20% we either catch and regenerate or we remove the text entirely and overlay it in Figma post-generation.

Composition. Does the subject sit where the brief asked? Is there unwanted visual clutter in the background? Does the image have the focal hierarchy the brand expects?

Brand feel. Does this read as a WondraKids image, or an generic warm-toned child-and-pencil shot? The check is subjective but trainable — after ~100 reviews, our operators can make this call in 10 seconds.

AI tell-giveaways. The specific artifacts that say "AI-generated": too-perfect symmetry, subtly wrong anatomy (especially hands), nonsense in the background, overly smooth skin in "photographs." We catch ~80% of these; the remaining ~20% are still publishable because the overall impression is brand-appropriate.

If an image fails any of these checks, we regenerate with a more specific prompt addressing the failure. A typical session: 8–10 variants generated, 2–3 passed QA, 1–2 shipped to final use.

The failure modes we've learned

Over-specification. A prompt with 14 constraints produces output that meets all 14 constraints and has no coherent visual identity. The model starts drawing the list, not the image. Rule of thumb: if your prompt is more than 6 sentences, it's probably too long.

Under-specification. A prompt with 1 sentence produces generic output. The sweet spot is 4–6 sentences with specific instructions about composition, light, and palette.

Chasing a specific image in your head. The model is not a render of a specific picture you imagined. It will generate in the direction of your description, not the exact thing. Operators who spend 30 minutes trying to coax one specific vision out of the model are operating against the tool. Adjust your vision to what the model gives you, or bring in a photographer.

Ignoring the reference image. Operators who prompt without reference images produce inconsistent output and blame the model. The reference is doing most of the work; the prompt is the fine-tune.

Not rotating references. Using the same reference image for 200 generations produces output that's increasingly identical — the model is locking onto the reference's specific details, not its style. Rotate references quarterly, at minimum.

In-image typography for anything legally sensitive. Generated text is right 80% of the time. For legal disclosures, exact product names, prices — never trust the model. Type overlays happen in post.

What it replaces in the budget

For the four brands we operate or advise on, here's the rough substitution pattern:

Replaced	Monthly $ before	Monthly $ after
Stock photography subscriptions	280	0
Illustrator/designer contract (spot illustration)	2,200	600
Minor product photography (lifestyle shots)	1,400	200
Blog post header images (hand-designed)	800	140
Total	4,680	940

The designer line didn't go to zero. The designer's role shifted from "producing all the images" to "reviewing and refining the 20% that need hand-touch, and maintaining the brand's reference library." The designer we work with reports being happier with the shift — less production grunt work, more strategic brand work.

Spend on the model itself, across four brands: roughly $340/month at the volume we run (~1,000 final images per month across all four).

Net monthly saving: roughly $3,400 versus prior stack. We reinvest about half into the designer's strategic work and keep half.

The specific prompts that have worked

For anyone wanting starting templates, here are abbreviated prompt templates we've iterated on that consistently produce brand-usable output for specific kinds of images.

Lifestyle product shot:

A [product], [color], placed on [surface] in soft natural light. Shallow
depth of field, slight film-grain, warm tones. Background blurred and
minimal — [single environmental element]. Composition: product in lower-
third, frame upper two-thirds for negative space.

Editorial portrait-style:

Close-up three-quarters view of [subject]. Soft window light from the
left, gentle shadows on the right. Palette: muted [brand-accent], warm
neutrals, cream. Slightly desaturated, film-photography adjacent.
Composition: subject off-center, weight to the left, space to the right.

Abstract / atmospheric:

An abstract composition of [objects/forms] suspended in soft diffused
light. Palette strictly limited to [3 brand colors]. Minimal, editorial,
with deep negative space around the main cluster. No text. No human
figures.

Data or infographic-adjacent:

[skip the model for these — use Figma or Claude-generated SVG]

That last one is the important line. Not every image should be generated. For anything that needs specific shapes, numeric accuracy, or typographic precision, the model is the wrong tool. We generate in the model when we want a textured, atmospheric, lifestyle-feeling image. We generate in Figma or as SVG when we want structure.

I stopped spending my Fridays making header images for blog posts I don't care about. I now spend them on the parts of the brand that needed work all year.
— our designer, six months into the new workflow

The monthly ritual

Once a month, we run a brand-imagery review.

Review all images generated that month across every brand
Flag any that drifted off-brand (usually 2–5 per brand)
Update the reference library for any brand where style has evolved
Update prompt templates where the model's output has shifted
Retire any prompts that are producing inconsistent output

This ritual takes about 90 minutes per brand, per month. It's what keeps the output reliable. Without it, output drifts quietly over 3–4 months and suddenly, one day, the team notices that the visuals feel "off" without being able to say why.

What 2026 looks like

We expect the workflow to continue tightening. Two changes we're watching:

Tighter reference conditioning. The current model's reference anchoring is good but not as tight as dedicated brand-fine-tuning would be. We expect model-side fine-tuning for specific brands to be a workable option in 2026, which would lift the "raw output passes QA" number from 60% to probably 75–80%.

Multi-image coherence for campaigns. Generating a full campaign (hero, two supporting visuals, product shots) as a set, with inherent coherence, is still weaker than we'd like. Current workflow is to generate each image separately and rely on the reference to keep them coherent. This is a solved problem in principle and will be solved in product in 2026.

For now, the workflow above is stable, tested, and in continuous production across four brands — and feeds the larger content automation pipeline that drives our billion-view distribution. It's not magic. It's a discipline applied to a tool that rewards discipline.

▲ Takeaways

Always anchor generation to a reference image. The reference is doing most of the work; the prompt is the fine-tune. Without it, your output drifts across sessions.
Write the brief in 4–8 lines, separate from the prompt. The brief is the spec; the prompt is the translation. Conflating them locks out variants you'd want to try.
The QA loop is where 'AI image' becomes 'brand image.' Expect ~40% of initial outputs to be rejected. 10 minutes of operator time per final image is the realistic budget.
Don't generate infographics or precise-shape work in an image model. Use SVG or Figma. Image models are for textured, atmospheric, lifestyle images — not structure.
Budget for a monthly ritual: review outputs, rotate references, retire drifted prompts. The workflow is only stable when maintained; skip this and quality declines in 3–4 months.

— filed underDesign AI Branding wondrakids

— share

x in tg

— keep reading

Three more from the log.

001 · AI

Kling 3 for ad creatives: costs, prompts, results

Three months shipping kling 3 ad creatives for a DTC brand. What worked, what was expensive, what made Meta's algorithm happy — and what it didn't.

Feb 24, 2026 · 8 min

002 · AI

MiroFish vs OpenClaw Factory: simulation vs execution

Two open-source multi-agent systems, two different bets. MiroFish simulates societies; OpenClaw executes work. Here's what each is for, and why it matters.

Apr 18, 2026 · 11 min

003 · AI

Stop buying SaaS — build internal micro-tools instead

The SaaS stack tax is eating your agency margin. Three internal micro-tools we built in a weekend each, the VPS economics, and when SaaS still wins.

May 25, 2026 · 11 min