Nano Banana Pro for brand imagery: the complete workflow
Six months using nano banana pro brand imagery across four brands. The model's quirks, the prompts that work, the QA loop that keeps visuals on-brand.

After six months running image generation for four brands — a children's accessory store, a B2B agency, a trading-education SaaS, and a studio blog — the shape of the work has settled. The question has shifted from "is the output good enoug
After six months running image generation for four brands — a children's accessory store, a B2B agency, a trading-education SaaS, and a studio blog — the shape of the work has settled. The question has shifted from "is the output good enough?" to "what's the workflow that makes the output reliable?" The nano banana pro brand imagery workflow we've converged on is a three-stage pipeline: a brief specification stage where the human operator defines the brand constraints, a generation stage with careful prompt anchoring to a reference image, and a QA stage where roughly 40% of initial outputs are rejected and regenerated with sharper guidance — the raw model produces brand-usable images about 60% of the time, and the pipeline lifts that to ~95% with about ten minutes of operator time per final image. This is the workflow, the prompts, the common failure modes, and the economics.
Why this model, why now
Image generation crossed a threshold in late 2025 that earlier generations didn't clear. The specific properties that matter for brand work:
Typographic legibility. Previous models could produce aesthetically-pleasing images, but in-image text was a pile of glyphs that looked like letters from a distance. Nano Banana Pro renders real, readable text about 80% of the time — enough that a brand headline can actually appear inside a generated visual.
Style consistency across an image set. Generate ten images for a single campaign and the previous generation's output would have ten subtly different visual languages. Nano Banana Pro holds a style across a set, especially when prompted against a reference image.
Reference-anchored generation. Upload a single brand reference image, and subsequent generations match its color palette, lighting, composition feel. This alone is what makes it workable for brand applications, versus one-off stylish images.
Reliable product rendering. Physical products (bottles, boxes, accessories) render at a quality where the generated image is usable — not photorealistic, but believable enough for lifestyle contexts.
None of these capabilities is perfect. All of them cross the threshold where the output is salvageable with reasonable human editing, where earlier models were not.
The three-stage pipeline
Every image we publish goes through the same three stages.
Stage 1: Brief. The operator writes a short brief — 4–8 lines — specifying the subject, the context, the composition, the tone, and the brand constraints. This is a compressed design spec, not a prompt.
Stage 2: Generate. The operator translates the brief into a prompt, anchors against a reference image from the brand's visual library, and generates 4–6 variants. Selects 1–2 to keep. Regenerates if none of the 4–6 are usable.
Stage 3: QA. The selected image goes through a checklist review: color fidelity, typography (if any text is in the image), composition, brand feel. If anything fails, the image is regenerated with more targeted guidance. The QA step catches maybe 3–4 issues per 10 images that would otherwise slip through.
Each stage is short. Together they average 10 minutes per final image. The first minute is the brief; the next 5–6 are generation and selection; the last 3 are QA.
The brief: what goes in, what stays out
A good brief for image generation is shorter than people expect. It contains:
- Subject. What's in the image.
- Context. Where it is, what's happening.
- Composition. Close-up, medium, wide; centered or off-center; where the subject sits in frame.
- Tone. Warm/cool/neutral; bright/moody; energetic/calm.
- Brand constraints. Color palette, typographic treatment if applicable, any must-haves or must-nots for this brand.
What stays out: the actual prompt. The brief is a specification; the prompt is the operator's translation of the spec into the model's preferred input format. Writing the brief in prompt syntax prematurely locks out variants you'd want to try.
Example brief for a WondraKids blog post header:
Subject: A child's hand holding a colored pencil over a sketch pad.
Context: Cozy afternoon, soft natural light from a window.
Composition: Close-up, hand in lower-third, pad fills upper two-thirds.
Tone: Warm, desaturated, slightly nostalgic.
Brand: WondraKids — mint accent (#7fbfab), muted pastels, no saturated
primaries. Must read as calm, not cluttered.Five lines. Every subsequent generation for that post will reference this brief. If we want variants, we vary the brief, not the prompt directly.
The prompt: anchoring, specifics, constraints
The prompt that translates the above brief:
Close-up of a child's hand holding a wooden colored pencil, drawing on a
lightly-textured sketchpad. Hand in lower-third of frame. Sketchpad fills
upper two-thirds, partially blurred. Soft natural warm light, golden-hour
quality, indirect. Color palette dominated by muted mint green, soft
cream, pale warm grays. Highly desaturated, slightly nostalgic, film-
grain-adjacent. Not cluttered — minimal background.Four sentences. The prompt tells the model:
- What's in the frame (the hand, the pencil, the pad)
- Where things are placed (composition)
- How the light looks
- What colors are allowed
- What the image should feel like
What's not in the prompt: brand name, words like "brand imagery," instruction about the model's style. The model works better when you describe the image, not the intent.
We also attach a reference image to anchor style. The reference is a previously-approved WondraKids post header. The model reads the reference for color palette, lighting mood, and general feel. The prompt describes the scene; the reference describes the visual language.
The reference library
Every brand we work with has a reference library of 20–40 approved images that serve as generation anchors. These are usually:
- Previous published images (for existing brands with a design history)
- Carefully selected "north star" images sourced from references (for newer brands)
- A small set of moodboard images representing the brand's emotional range
For each new generation session, the operator picks the 1–2 reference images closest to the brief's requirements and uses them as anchors. This is the single most impactful variable in our workflow. Without reference anchoring, the model drifts across sessions. With it, the model's outputs cohere into a single visual language across 50+ generations.
Maintaining the library is real work. We add 3–5 new references per brand per quarter and retire 1–2. The library is not a static asset; it evolves with the brand.
The QA checklist
Raw outputs pass a five-point checklist before they're approved for use.
Color fidelity. Does the palette match the brand's tokens? We eyedrop the generated image and compare against the brand's accent color spec. Tolerance of ±8% on hue, ±5% on saturation. If the generated image's accent is off, we either regenerate with more specific color guidance or we correct in post (which we try to avoid).
Typography (if any). If the image contains text, is every letter correctly rendered? Is the font treatment consistent with the brand? Generated text is right about 80% of the time; the other 20% we either catch and regenerate or we remove the text entirely and overlay it in Figma post-generation.
Composition. Does the subject sit where the brief asked? Is there unwanted visual clutter in the background? Does the image have the focal hierarchy the brand expects?
Brand feel. Does this read as a WondraKids image, or an generic warm-toned child-and-pencil shot? The check is subjective but trainable — after ~100 reviews, our operators can make this call in 10 seconds.
AI tell-giveaways. The specific artifacts that say "AI-generated": too-perfect symmetry, subtly wrong anatomy (especially hands), nonsense in the background, overly smooth skin in "photographs." We catch ~80% of these; the remaining ~20% are still publishable because the overall impression is brand-appropriate.
If an image fails any of these checks, we regenerate with a more specific prompt addressing the failure. A typical session: 8–10 variants generated, 2–3 passed QA, 1–2 shipped to final use.
The failure modes we've learned
Over-specification. A prompt with 14 constraints produces output that meets all 14 constraints and has no coherent visual identity. The model starts drawing the list, not the image. Rule of thumb: if your prompt is more than 6 sentences, it's probably too long.
Under-specification. A prompt with 1 sentence produces generic output. The sweet spot is 4–6 sentences with specific instructions about composition, light, and palette.
Chasing a specific image in your head. The model is not a render of a specific picture you imagined. It will generate in the direction of your description, not the exact thing. Operators who spend 30 minutes trying to coax one specific vision out of the model are operating against the tool. Adjust your vision to what the model gives you, or bring in a photographer.
Ignoring the reference image. Operators who prompt without reference images produce inconsistent output and blame the model. The reference is doing most of the work; the prompt is the fine-tune.
Not rotating references. Using the same reference image for 200 generations produces output that's increasingly identical — the model is locking onto the reference's specific details, not its style. Rotate references quarterly, at minimum.
In-image typography for anything legally sensitive. Generated text is right 80% of the time. For legal disclosures, exact product names, prices — never trust the model. Type overlays happen in post.
What it replaces in the budget
For the four brands we operate or advise on, here's the rough substitution pattern:
| Replaced | Monthly $ before | Monthly $ after |
|---|---|---|
| Stock photography subscriptions | 280 | 0 |
| Illustrator/designer contract (spot illustration) | 2,200 | 600 |
| Minor product photography (lifestyle shots) | 1,400 | 200 |
| Blog post header images (hand-designed) | 800 | 140 |
| Total | 4,680 | 940 |
The designer line didn't go to zero. The designer's role shifted from "producing all the images" to "reviewing and refining the 20% that need hand-touch, and maintaining the brand's reference library." The designer we work with reports being happier with the shift — less production grunt work, more strategic brand work.
Spend on the model itself, across four brands: roughly $340/month at the volume we run (~1,000 final images per month across all four).
Net monthly saving: roughly $3,400 versus prior stack. We reinvest about half into the designer's strategic work and keep half.
The specific prompts that have worked
For anyone wanting starting templates, here are abbreviated prompt templates we've iterated on that consistently produce brand-usable output for specific kinds of images.
Lifestyle product shot:
A [product], [color], placed on [surface] in soft natural light. Shallow
depth of field, slight film-grain, warm tones. Background blurred and
minimal — [single environmental element]. Composition: product in lower-
third, frame upper two-thirds for negative space.Editorial portrait-style:
Close-up three-quarters view of [subject]. Soft window light from the
left, gentle shadows on the right. Palette: muted [brand-accent], warm
neutrals, cream. Slightly desaturated, film-photography adjacent.
Composition: subject off-center, weight to the left, space to the right.Abstract / atmospheric:
An abstract composition of [objects/forms] suspended in soft diffused
light. Palette strictly limited to [3 brand colors]. Minimal, editorial,
with deep negative space around the main cluster. No text. No human
figures.Data or infographic-adjacent:
[skip the model for these — use Figma or Claude-generated SVG]That last one is the important line. Not every image should be generated. For anything that needs specific shapes, numeric accuracy, or typographic precision, the model is the wrong tool. We generate in the model when we want a textured, atmospheric, lifestyle-feeling image. We generate in Figma or as SVG when we want structure.
— our designer, six months into the new workflowI stopped spending my Fridays making header images for blog posts I don't care about. I now spend them on the parts of the brand that needed work all year.
The monthly ritual
Once a month, we run a brand-imagery review.
- Review all images generated that month across every brand
- Flag any that drifted off-brand (usually 2–5 per brand)
- Update the reference library for any brand where style has evolved
- Update prompt templates where the model's output has shifted
- Retire any prompts that are producing inconsistent output
This ritual takes about 90 minutes per brand, per month. It's what keeps the output reliable. Without it, output drifts quietly over 3–4 months and suddenly, one day, the team notices that the visuals feel "off" without being able to say why.
What 2026 looks like
We expect the workflow to continue tightening. Two changes we're watching:
Tighter reference conditioning. The current model's reference anchoring is good but not as tight as dedicated brand-fine-tuning would be. We expect model-side fine-tuning for specific brands to be a workable option in 2026, which would lift the "raw output passes QA" number from 60% to probably 75–80%.
Multi-image coherence for campaigns. Generating a full campaign (hero, two supporting visuals, product shots) as a set, with inherent coherence, is still weaker than we'd like. Current workflow is to generate each image separately and rely on the reference to keep them coherent. This is a solved problem in principle and will be solved in product in 2026.
For now, the workflow above is stable, tested, and in continuous production across four brands. It's not magic. It's a discipline applied to a tool that rewards discipline.
Three more from the log.

Kling 3 for ad creatives: costs, prompts, results
Three months shipping kling 3 ad creatives for a DTC brand. What worked, what was expensive, what made Meta's algorithm happy — and what it didn't.
Feb 24, 2026 · 8 min
MiroFish vs OpenClaw Factory: simulation vs execution
Two open-source multi-agent systems, two different bets. MiroFish simulates societies; OpenClaw executes work. Here's what each is for, and why it matters.
Apr 18, 2026 · 11 min
How to build a blog that ranks and gets cited by LLMs
SEO and LLM citation are different games that happen on the same page. Here's the llm seo blog pattern that wins both in 2026 — structure, voice, proof.
Apr 14, 2026 · 9 min