What does a billion-view-per-month content pipeline actually cost to run?

On the operator side we model roughly $0.18–$0.42 per finished short, which lands at $5.4K–$12.6K per million units and scales linearly. The volume to clear a billion monthly views depends on per-asset performance, but in our case it sits around 28–34 thousand finished shorts per month across roughly 60 active accounts. The total all-in monthly software bill is in the mid five figures — Kling 3 video credits and the LLM verification calls dominate, with Nano Banana Pro and Flux stills a distant third.

Can a system this large run without humans in the loop?

No. The pipeline is automated end-to-end for content production, but humans sit at three checkpoints: topic curation review once a week, an LLM-judge override for borderline rejects, and account-health remediation when any single account flags an automated-detection signal. The combined human time is roughly 6 hours per week per million weekly views — which is the metric we actually care about, not 'fully autonomous'.

Which AI tools matter most in this stack?

Nano Banana Pro and Flux for high-quality stills, Kling 3 for image-to-video at scale, Claude (via Claude Code) for topic generation and the LLM-as-judge verification loop, Remotion for programmatic assembly when we need exact frame control, and CapCut for higher-velocity templated assembly. The least replaceable component is Claude on the verification side — losing it would force us to ship more slop or hire more reviewers, neither of which scales.

How do you avoid platform bans on automated accounts?

By not behaving like an automated account. We post on staggered schedules per region, vary captions, never re-use the exact same audio, run accounts through residential IPs from the geography of the account, and warm new accounts for 14 days before they post anything algorithmic-sensitive. The single biggest cause of bans isn't velocity — it's homogenization. A pipeline that produces 200 visually identical clips with the same caption template gets caught; a pipeline that produces 200 visually distinct clips with topic-aware captions does not.

Where does this break at scale?

Distribution is the bottleneck, not production. We can produce twice as many shorts as we currently distribute. The constraint is account-management headcount and the platform-side cap on how many accounts a single operator can plausibly run before pattern detection kicks in. The economic break-even on adding another account is well under $200 in lifetime ad-equivalent value, which is why the system keeps adding accounts rather than producing more per account.

← back to indexblog / ai / content-automation-system-1-billion-views

● AI

The content automation system that ships 1 billion views per month

A field report on the actual architecture behind a billion-view-per-month AI content pipeline — topic generation, Nano Banana Pro and Flux for stills, Kling 3 for image-to-video, Remotion and CapCut for assembly, LLM-as-judge for slop rejection, and the distribution layer that doesn't get accounts banned.

Arthur HofFounder, Bunny Honey Club AI

publishedApr 13, 2026

read12 min

The system in this post pushed past one billion monthly views in late March. The number is a lagging measure of a pipeline we have been running and modifying for roughly fourteen months — through three model generations, two deeply painful platform-policy shifts, and one cohort of accounts that all got banned in a single weekend because we'd shipped a templated caption ending in the same emoji. The pipeline runs across Instagram Reels, TikTok, and YouTube Shorts, with a much smaller spillover into LinkedIn and X. It was built initially for a single fitness-transformation client whose images we'd been generating with Nano Banana Pro, then generalized into a six-vertical content engine that now operates as a separate book of business inside the studio. The honest version of this case study is that the AI tools did roughly half the work, the verification step did roughly a quarter, the distribution-layer engineering did the remaining quarter, and zero of it would have shipped without an operator who understood that "fully autonomous" is a marketing claim and "humans at three specific checkpoints" is how you actually run it.

The shape of the pipeline

The whole system is five stages plus a feedback loop. Each stage runs as an independent service with its own queue, its own retry behavior, and its own cost meter. Failures at any stage drop the asset, not the batch.

Stage 1: topic generation. A Claude Code job runs once a day per vertical and produces a list of 80–120 candidate topics. The job pulls from three sources: a curated subreddit list scoped to the vertical, the previous week's top-performing assets across all our accounts in the same vertical, and a small cache of "evergreen frames" we've manually written. The Claude prompt is unusually long — about 1,400 tokens — because it has to reject topics that overlap too closely with what we shipped in the last 14 days. Each surviving topic comes out with a one-line hook, a target visual concept, and a target duration in seconds.

Stage 2: visual generation. Each surviving topic gets a still image (or batch of stills, for sequence-style shorts) generated by either Nano Banana Pro or Flux Pro. The model choice is per-vertical and per-asset-type — Nano Banana Pro for anything that needs text in the frame, Flux for the photoreal verticals, Midjourney for stylized work where we have time for the slower API. The stills queue runs at maximum concurrency the providers will let us push without rate-limiting; in practice that's about 240 stills per hour aggregated across providers.

Stage 3: image-to-video. The stills feed Kling 3 for image-to-video, with each clip generated at 5 seconds of motion at 1080p. We default to Kling's "Standard" tier on cost grounds and only escalate to "Pro" tier when an asset is destined for a high-CPM vertical or has been flagged by the LLM judge as needing higher motion fidelity on a re-render. The full image-to-video pass is the slowest stage — single clips take 90–180 seconds even on Standard tier — so the pipeline is heavily parallelized, with each vertical getting its own Kling worker pool.

Stage 4: assembly. Two paths: the Remotion path for shorts where we need exact frame control (anything with on-screen text overlays, graph reveals, before/after pans) and the CapCut path for higher-velocity templated assembly (most lifestyle and educational verticals). The Remotion side runs on a small dedicated machine with hardware video encoding; the CapCut side runs through CapCut's API on a per-template basis. Both produce 9:16 1080x1920 MP4 outputs.

Stage 5: verification. Every assembled short is run through an LLM-judge pass before it gets queued for distribution. The judge — a Claude prompt with a fixed rubric — answers six questions per clip: is the hook visible in the first 1.2 seconds, is there any visual artifact in the first frame, does the caption match the visual content, is the audio level within a -3 to -9 dBFS window, does the topic overlap too heavily with another asset shipped in the last 7 days, and is there any AI-generated artifact (hands, text mangling, identity drift on a recurring character) that would tip a viewer. Anything that fails two or more checks goes to a human reviewer; anything that fails on the artifact check alone is auto-rejected and the asset is killed.

Distribution loop. Approved shorts land in a per-account scheduling queue. Each account has a posting cadence, a per-platform variant chain (the same short with different captions, hashtags, audio overlays per platform), and a kill switch tied to its account-health monitor. Performance data — views, watch-through, saves, follows-from-this-post — gets piped back to the Stage-1 topic generator on a 6-hour delay, which closes the feedback loop.

1B+monthly views, March

~31Kshorts shipped per month

~60active distribution accounts

$0.18–$0.42cost per finished short

The operator-level details that don't appear in the architecture diagram

The diagram is the easy part. The reasons this pipeline is expensive to build but cheap to run live below the diagram, in the seams between stages.

The first detail is that each stage runs idempotently. If Kling silently returns a corrupted MP4 (which happens at roughly a 0.4% rate even on the Pro tier), the assembly stage detects the corruption on the first frame-extract pass and re-queues the clip, which means the failure never escalates to a human and the cost is one extra Kling call. Idempotency at every stage is the single architectural decision that lets the system run unattended overnight.

The second detail is that we treat the LLM-judge stage as the single most important component in the pipeline. The verification rubric is six questions long because we tried it with twelve, and the judge's accuracy collapsed; we tried it with three, and the judge passed too much slop. Six is the experimentally validated point at which the per-question accuracy stays above 92% and the aggregate decision matches a human reviewer about 88% of the time. The remaining 12% disagreement is what the human-reviewer queue exists to handle, and the queue's daily volume is the metric we use to know whether the rubric needs re-tuning.

The third detail is the slop rejection rate. The LLM judge kills somewhere between 14 and 22 percent of every batch outright. That number is high, deliberately. Early versions of the pipeline shipped most of what they generated, and the per-account performance suffered visibly within three weeks — saves dropped, watch-through dropped, the algorithm moved the accounts down. The economics of slop are not "free content is good"; the economics of slop are that one bad short reduces the next ten shorts' organic distribution on the same account. So we kill aggressively at the verification gate. The rejected assets are not regenerated. They are dropped, and the topic gets a cooldown stamp so the topic generator won't re-propose it for 14 days.

The fourth detail is the account warmup. New accounts spend 14 days posting nothing AI-generated. They post 2–3 manually-curated reposts a day, follow accounts in their target niche, react organically. After 14 days the account is "warm" and starts receiving pipeline content. We tried compressing the warmup to 7 days for one cohort of 12 accounts; 9 of the 12 hit a soft suppression signal within 30 days. The 14-day warmup costs about $40 per account in operator time and prevents an outcome that costs about $80 per account in lifetime ad-equivalent value lost.

The fifth detail is residential proxies. Every account has a dedicated residential IP from the geography of its bio. We rotate the IP only on account migration — never mid-session. The residential proxy stack is the single most boring infrastructure component and the one most likely to get a pipeline killed if it's run on cheap datacenter IPs.

What we learned generating fitness-transformation avatars

The first vertical the pipeline shipped was AI fitness-transformation content for a single client. The brief was straightforward: produce before/after avatars showing weight-loss outcomes in a way that read as visually credible without actually misrepresenting any specific outcome. The legal posture was that every avatar was clearly labeled as AI-generated and the client's marketing copy was written around aggregate outcomes, not individual transformations.

The visual problem was harder than it sounds. Early Nano Banana Pro generations produced "before" and "after" images that didn't read as the same person. The face structure drifted, the lighting differed, the wardrobe was incongruent. We solved it with a two-pass approach: generate the "before" first with full control over identity tokens, then use Nano Banana Pro's edit mode to produce the "after" by passing the "before" image as a reference and instructing the model to preserve facial identity while modifying body composition and pose. The two-pass approach took the identity-drift failure rate from about 35% on first-shot generations to about 4% on edit-mode generations.

The motion problem was easier. We pass the "before" still and the "after" still to Kling 3 as a two-frame sequence, ask it to produce a 5-second morph clip, and let Kling handle the in-between motion. Kling's interpolation is uncannily good at this specific transition. The first 1.2 seconds of the clip — the hook — is the "before" hold; the next 2.8 seconds is the morph; the final 1 second is the "after" hold. The whole thing reads as a visual transformation without ever claiming to depict a real person.

The volume out of this vertical is the largest in the whole pipeline. The fitness vertical accounts for roughly 38% of monthly shorts and a slightly higher share of monthly views, because the per-asset performance is exceptional in the relevant demographic.

What we learned generating Peptivo GLP-1 campaign assets

Peptivo is a different shape of problem. GLP-1 medications are a regulated category, and the platform-side moderation on weight-loss-medication content is aggressive and inconsistent across platforms. A short that lives happily on TikTok might be flagged on Instagram. A short that runs on YouTube Shorts might be silently deranked on TikTok.

The pipeline solution was to add a per-platform compliance pre-check between the assembly stage and the distribution stage. The compliance pre-check is another LLM-judge pass — separate from the quality judge — that scores each short against the moderation rubric of each target platform and either approves the short for distribution on that platform or rejects it. A short can be approved on YouTube Shorts and rejected on Instagram and the system will distribute accordingly.

The visual register for Peptivo content is intentionally calmer than the fitness-vertical work. The brief was educational explainer over visual storytelling, so the assets lean on Nano Banana Pro stills with overlay text rather than Kling motion. The verification rubric for this vertical was extended with a seventh question: does the caption avoid any specific clinical claim that we don't have substantiation for. The seventh question matters because the brand's legal counsel reviews the rubric, not individual shorts, and the rubric is what we operationally defend.

The Peptivo vertical accounts for roughly 11% of monthly shorts but a much higher share of paid-media usage downstream — the brand re-cuts the top-performing organic shorts as paid creative, which is a separate workflow we don't run but which loops performance data back to us.

We learned to build the account-health monitor before we built the next visual generation upgrade.
— The most expensive failure mode is a banned account, not a bad short.

What we learned generating Trimrx imagery

Trimrx is the smallest vertical by volume and the one most useful for understanding what the pipeline cannot do. Trimrx is a wellness brand whose visual language is editorial — soft lighting, magazine-style composition, considered product photography. Most of the pipeline's optimizations — slop rejection at 18%, cost-per-short at $0.30, throughput at 31K shorts a month — are wrong for Trimrx, because Trimrx's bar for a usable asset is much higher than the algorithmic-content bar that drives the fitness vertical.

We ran Trimrx through the pipeline for two months and the slop rejection rate climbed to 41%. The remaining 59% was usable but not on-brand. The honest read on that experiment is that the pipeline is not the right tool for an editorial brand at low volumes — it is the right tool for a category that rewards velocity over composition. We moved Trimrx onto a separate, slower workflow with a human art director in the loop on every asset, and the conversion of pipeline shorts to Trimrx output went from 59% usable to 91% usable, at the cost of throughput dropping from 600 shorts a month to 80.

The lesson generalizes. The pipeline produces algorithmic-fit content, not brand-prestige content. The distinction matters and it does not show up in any of the metrics that algorithmic platforms surface.

What we learned wiring Claude Code into the topic-generation stage

The topic generator was the first stage to run on Claude Code rather than a one-off script, and the upgrade changed the shape of what topic generation could do. The earlier version was a deterministic prompt that ran against a fixed corpus and produced a fixed-length list. The Claude Code version is a small agent loop that can read the previous week's performance data, query the per-vertical subreddit corpus on demand, propose a draft topic, run a self-check against the 14-day overlap cache, revise the topic if the overlap check fails, and emit a final list with confidence scores per topic.

The agent-loop version produces topics whose first-week performance is roughly 22% higher than the deterministic-prompt version on every metric we measure. The cost difference is small — about $0.04 per topic generated, against $0.006 for the deterministic version — and the cost is amortized across thousands of downstream shorts that all benefit from a better hook.

We wrote the topic-generation Claude Code agent the same way we wrote the solo-founder agent stack — small role, narrow scope, idempotent, no write access outside its own bucket.

The phased rollout, in months

The pipeline did not arrive fully formed. The first version shipped 80 shorts a month and broke constantly. The current version ships 31,000. The rough timeline:

Phase	Months	Throughput	Slop rate	Active accounts
1: single-account proof	M1–M2	~80/mo	~62%	1
2: vertical expansion	M3–M5	~1.2K/mo	~38%	6
3: judge introduced	M6–M7	~3.4K/mo	~24%	12
4: account warmup formalized	M8–M9	~6.8K/mo	~21%	22
5: Kling 3 + Nano Banana Pro	M10–M11	~14K/mo	~18%	38
6: distribution-layer rebuild	M12–M14	~31K/mo	~16%	60

Two of these phases account for almost all of the volume gain: phase 3 (introducing the LLM judge, which made it economic to ship at higher throughput because slop stopped killing accounts) and phase 6 (rebuilding the distribution layer, which removed the operator-time bottleneck on how many accounts could be run in parallel).

The cost model that justifies the build

The unit economics matter more than the headline view count, because the pipeline only pays back if the cost-per-short and the value-per-view stay in the right ratio.

Cost-per-finished-short averages $0.30 across the full mix, with the cheapest verticals (text-overlay-style explainers) coming in around $0.18 and the most expensive (high-fidelity Kling Pro motion) coming in around $0.42. The cost line is dominated by Kling 3 video generation (about 58% of the per-short cost), with LLM verification (about 22%), still generation (about 12%), and distribution-layer infrastructure (about 8%) splitting the remainder.

Value-per-view varies wildly by vertical. The fitness vertical clears around $1.40 CPM on the brand's downstream paid spend; Peptivo clears closer to $3.20 CPM because the regulated-category buyer pool is smaller and more valuable; Trimrx clears almost nothing in the algorithmic-distribution channel because the brand's actual buyers don't shop on TikTok.

Aggregate, the pipeline pays back at roughly 5.4x revenue-per-dollar-of-cost across the mix. Most of that delta is concentrated in the two highest-paying verticals, and the lower-CPM verticals function as account-health insurance — they keep the accounts active, varied, and looking organic, which protects the high-CPM verticals from getting flagged.

What we'd do differently if we built this again

Three things, in priority order.

The first is to build the account-health monitor before the second vertical, not after the fifth. The cost of building it early is two engineer-weeks. The cost of not building it early was the 23-account banning weekend in November, which set the program back roughly six weeks.

The second is to invest in the LLM-judge rubric before any throughput optimization. We spent two months in phase 2 trying to push throughput from 1,200 to 3,000 shorts a month before introducing the judge in phase 3. The judge would have made the throughput push easier, cheaper, and less account-damaging if we'd built it first.

The third is to write the cost meter into every stage from day one. The current pipeline emits a per-asset cost record at every stage, which is how we know what the cost-per-short actually is. The earlier versions estimated cost from invoice-level rollups, which made it impossible to know which vertical was expensive and which was cheap. The per-asset meter took an afternoon to build retroactively and we should have built it on day one.

We've now generalized enough of this stack that we run it for three other operators in adjacent categories on a revenue-share basis, which is closer to a productized service than the traditional agency model the same team used to operate. The shape of the work is different. The infrastructure is the product.

▲ Takeaways

The verification stage is the load-bearing component, not the generation stage. Build the LLM-judge rubric early, validate it against a human-reviewer baseline, and treat slop rejection rate as a primary metric.
Account health is the bottleneck at scale, not production capacity. Build per-account monitoring before scaling distribution, and budget for residential proxies and 14-day account warmups.
Idempotency at every stage is the single architectural decision that lets the pipeline run unattended overnight. Without it the human-operator load scales linearly with throughput.
Per-vertical economics vary by an order of magnitude. Treat low-CPM verticals as account-health insurance, not as standalone revenue lines.
The pipeline produces algorithmic-fit content, not brand-prestige content. Editorial brands should not be run through a velocity-optimized stack.

— filed underAI Case Study Automation Tooling Engineering

— share

x in tg

— keep reading

Three more from the log.

001 · AI

Granola vs Fireflies vs Otter vs Fathom: the real pick

Four AI meeting notetakers compared on what actually matters — bot visibility, CRM automation, and the two-tool combination power users actually run.

Jul 04, 2026 · 6 min

002 · AI

Retell vs Vapi vs Bland: the real cost per minute

Retell, Vapi, and Bland compared on latency, real pricing at 10,000 calls, and compliance. The voice AI stack we'd actually build a client's receptionist on.

Jul 03, 2026 · 7 min

003 · AI

Your WhatsApp bot got banned. Here's the fix.

WhatsApp banned general-purpose AI chatbots as of Jan 15, 2026. What's actually still allowed, what got shut off, and the rebuild path for small businesses.

Jun 29, 2026 · 7 min