Building Shadow Inbox: a Reddit + Hacker News buying-signals SaaS in one sprint
OpenClaw shipped a reddit monitoring tool with Claude as the engineering backend in a single sprint. The full ai built saas case study — scope, security audit, GTM, and the first numbers.

In late February I scoped a small product on a single page of paper. The thesis was narrow: monitor Reddit and Hacker News for posts and comments where a user is actively asking for a tool that solves the problem your SaaS solves, and surfa
In late February I scoped a small product on a single page of paper. The thesis was narrow: monitor Reddit and Hacker News for posts and comments where a user is actively asking for a tool that solves the problem your SaaS solves, and surface those matches in a daily inbox. The OpenClaw Factory agent fleet — the same setup we wrote about in the 33-agents field report — picked the project up, ran it for one sprint, and shipped a paid product on the other side. The product is Shadow Inbox, and you can try it at https://www.getshadowinbox.com. The honest version of this ai built saas case study is that one operator scoped the product, four Claude-driven agent roles built it across one sprint, the security audit took as long as the last build week, and the credit-based pricing model was the single most consequential decision in the whole sprint — bigger than the framework choice, the LLM choice, or the GTM channels. This is the build, the numbers, and what I'd do differently.
What Shadow Inbox actually does
The product is a thin wedge with a clear job. You configure a small number of intent queries — phrases like "looking for a tool that does X," "tried Y but it doesn't Z," "is there an alternative to W" — scoped to your category. Shadow Inbox crawls Reddit and Hacker News on a schedule, runs each candidate post and comment through a relevance pass, and writes the matches to a daily inbox keyed to your account.
A match isn't a keyword hit. The intent classifier — a small Claude prompt with a handful of canonical examples — answers a single question per item: "is this person describing a buying intent that maps to the user's category?" Most matches that pass simple keyword filters fail this gate, which is the point. The whole product is the gate.
The output is a list. Each entry is a link to the source post or comment, a one-sentence "why this matched" line, the timestamp, and a button to mark the lead as worked, dismissed, or escalated. The user opens the inbox once a day, replies to the threads that matter, and closes it.
That's the whole product. No dashboards, no analytics suite, no integrations, no Slack bot in v1. The constraint is the feature.
The original observation
The reason I wrote the spec at all is that in 2024 and 2025 I had spent enough time in r/SaaS, r/Entrepreneur, r/marketing, and the comment sections of HN's "Show HN" threads to notice a pattern. People ask, in public, for tools they would buy. They describe the workflow they want. They name the competitor they tried and bounced off. Those threads sit there for 24 hours, sometimes 48, before sliding off the front page of the subreddit and out of practical visibility.
The mismatch is between intent and discovery. The buyer is screaming into the void. The seller, six tabs deep into a different problem, never sees it. By the time the seller's standard alerting catches the keyword — a Google Alert on the brand name two weeks later — the buyer has either picked a competitor or moved on.
This is what Shadow Inbox is built to fix. Not "here is content about SaaS." It is "here is a person, today, asking for a thing your SaaS does, and here is the comment box they will see your reply in." That's the only signal that converts.
I also have a personal version of the observation. The Tradoki build log from October included a footnote I never expanded on at the time: a third of the early Tradoki signups came from a single Reddit thread on r/options where someone had asked, in plain language, for "a tool that would tell me my trade plan is dumb before I place it." Jonas had answered the comment with a link. The conversion off that single thread was higher than from any paid channel for the next six weeks. We didn't have a way to find that thread; we got lucky. Shadow Inbox is the productized version of getting un-lucky.
How OpenClaw scoped it
The first commit of any OpenClaw project is a one-page spec, written by me, that the agent fleet uses as the source of truth for the rest of the sprint. The Shadow Inbox spec was 740 words and contained:
A product thesis (one sentence). A scope statement listing what is in v1 and what is not. A data model sketch — three tables, no foreign keys to anything outside the product. A pricing model decision — credit-based, low entry point. A security posture statement — RLS-on-every-table, server-side data gating, clickjacking-protected. A go-to-market list — six channels, ranked. A definition of done.
The spec did not contain: a feature list beyond the v1 scope, a UI mockup, a pricing table, a marketing site copy block, or any architectural guidance. Those came out of the build, decided by the agent that owned them.
The OpenClaw fleet works in role-based units. For Shadow Inbox the active roles were: a backend agent that owned the data model, the queue, and the API; a frontend agent that owned the dashboard and the marketing site; an operations agent that owned crawling, scheduling, rate limits, and the ingestion pipeline; and a security agent that ran the pre-launch audit and didn't ship anything itself. The four roles ran on top of Claude as the engineering backend, with me as product owner approving the daily commits.
One sprint, in OpenClaw terms, is a fixed two-week budget. The Shadow Inbox sprint used 12 of the 14 days. I'll detail what landed in each three-day chunk because the cadence is the part most builders underestimate.
The build, by phase
Days 1–3: data model + ingestion. The backend agent stood up Postgres on Neon, three tables (users, queries, matches), drizzle migrations, and the basic write paths. The operations agent built the Reddit crawl using the public JSON endpoints (no scraping, no API key gymnastics for v1) and the HN crawl using the Algolia HN search API. Both crawlers landed by day 3 with a deliberate ceiling — no more than 200 candidate items per query per day, dropping to 50 for free-tier accounts. The ceiling exists because the relevance pass costs Claude tokens, and we'd rather rate-limit than burn money.
Days 4–6: relevance + dashboard scaffold. The relevance classifier — a Claude prompt that takes the candidate item and the user's intent query and returns a binary plus a one-sentence reason — went through five iterations on day 4. The first version was 40 lines of prompt and 71% accuracy on a 200-item hand-labeled set. The fifth version was 22 lines of prompt and 89% accuracy on the same set. The trick was removing instructions, not adding them — every additional rule I added in the first three versions made the output more brittle. The frontend agent built the dashboard scaffold against fixture data on the same days.
Days 7–9: pricing, billing, account management. Stripe metered billing for the credit model is more involved than flat subscriptions. We use Stripe's usage-records API to record each query run, with the credit balance computed in our own database for fast reads. The plumbing took two days. The pricing copy on the marketing site — three credit packs, a free trial of N credits, no subscription minimum — took an afternoon and changed twice in QA.
Days 10–11: the security audit. This is the part I want to dwell on, because it's also the phase that most one-sprint shipping stories elide. The security agent — a Claude-driven role with a fixed audit checklist and write access to nothing — went through every endpoint, every multi-tenant table, every form post, and produced a list of 14 issues, of which 4 were blockers.
The blockers were:
- A
/api/queriesendpoint that returned the queries array for a user but did not check that the requesting session matched the user. RLS on the queries table caught this in the database; the application-level gate had been omitted. Without RLS we'd have shipped a tenant data leak. - A clickjacking exposure on the dashboard route. The marketing site set
X-Frame-Optionscorrectly; the dashboard subdomain didn't. Fix was a single line in the middleware:X-Frame-Options: DENYplusContent-Security-Policy: frame-ancestors 'none'. - A leaked Anthropic API key in the client bundle. The build had been pulling it from
process.env.NEXT_PUBLIC_ANTHROPIC_KEYbecause of a copy-paste from a prototype that ran the relevance pass client-side. The relevance pass had moved server-side three days earlier; the env-var name had not. Pure migration debt. - An unbounded write path on the queries-create endpoint. A logged-in user could spam the table with millions of query records. We added a per-user-per-hour rate limit and a hard cap on total queries per account.
The non-blockers were minor: missing rel="noopener" on a couple of outbound links, a debug console.log in production, two unused dependencies pulled in by a prototype. We fixed them anyway.
The audit took longer than I had estimated for the entire sprint's last week. That's the lesson and I will repeat it in the takeaways: budget the security audit as a major phase of the sprint, not a final sweep.
Day 12: launch. Marketing site copy in the morning, Stripe production keys in the afternoon, manual smoke test of the full flow (signup, configure query, wait for first match, mark as worked, see credit decrement, upgrade), and DNS cutover. The product went live at 18:42 local time on day 12. I posted nothing externally that night. The launch sequence began the next morning.
Pricing: the most consequential decision in the sprint
The decision that mattered more than any technical choice in the sprint was pricing. I'll show my work.
The default for a B2B SaaS in this category is a flat monthly subscription with three tiers: starter, plus, pro. We considered it and rejected it. The reason is that intent monitoring has a power-law usage shape — most accounts will run 3–5 queries; some accounts will run 50; a tiny number will run 500 — and a flat tier model either prices out the casuals or subsidizes the heavy users with the casuals' money.
The model we shipped is credit-based after a free trial, priced at a low entry point. New accounts get a free batch of credits — enough to run their first three queries for a week and see whether the inbox produces anything they care about. After that, you buy credits. A small pack covers a casual operator at a few dollars; a large pack covers a power user at a price that is still below what the next-cheapest competitor charges as their cheapest tier.
There are no feature gates between credit packs. The product is the same for everyone; you pay for what you use. The free trial converts because the first few matches surface real intent, and once an operator has replied to two threads and gotten a meeting from one, the credit pack pays for itself in a week.
This pricing decision interacts with everything else in the product. It removes the "should we lock the integrations behind Plus tier?" question because there are no tiers. It removes the "should we discount annually?" question because there is no subscription. It changes the GTM copy because the value pitch is "your first match in the next 24 hours" rather than "your first month for $X." Every downstream decision was simpler because of the pricing call.
The decision came out of the Wondrakids cost breakdown work I'd done in January, which had me thinking hard about which lines on a P&L are subsidized by which other lines. A flat-tier SaaS subsidizes power users with casuals' money and treats churn as the cost of the tier mismatch. A credit model bills the actual cost of the actual usage and lets churn happen on the credit pack, not on the subscription. For a usage-shape this lopsided, that's the better fit.
Pre-launch: the security audit, expanded
I want to come back to the audit because it's the difference between a sprint-shipped product that survives contact with paying users and a sprint-shipped product that gets called out on Twitter for a tenant leak.
The audit checklist for Shadow Inbox was:
- Row-level security on every multi-tenant table. Postgres RLS, policy-per-table, no exceptions. The policy is
tenant_id = current_setting('app.current_tenant')::uuid; the application sets the GUC at session start. Bypassing the application code can't bypass RLS. - Server-side data gating on every endpoint. Even with RLS, every API route validates the session and the requested resource ownership before issuing the database call. Defense in depth.
- Clickjacking protection.
X-Frame-Options: DENYplusContent-Security-Policy: frame-ancestors 'none'on every authenticated route. The marketing pages use'self'because they may legitimately be embedded. - No secrets in client bundles. Grep the build output for known API key prefixes. Grep environment variable names with
NEXT_PUBLIC_prefix that look sensitive. Nothing leaked. - Rate limits on every write path. Per-user-per-minute on writes, per-IP-per-minute on auth attempts, per-user-per-day on credit consumption. Implemented at the edge (Vercel middleware) and at the application layer.
- No raw SQL string interpolation. Drizzle ORM everywhere. Grep for tagged-template
sqlpatterns and confirm they're parameterized. - CSRF protection on form submissions. Same-origin checks on all state-changing requests; the framework's built-in CSRF token middleware on the dashboard.
- Webhook signature verification. Stripe webhooks verified against the signing secret on every event. No unsigned webhook accepted.
The agent that ran this audit had no write access to the codebase. Its only output was a JSON file of findings, which I reviewed before any fix landed. The non-blocker findings I let agents fix; the blocker findings I had a human (me) reproduce manually before the fix went in. That's the right ratio for a one-operator sprint — agents find, humans verify the worst class.
— my own commit message on day 11The audit took as long as the feature build of week two. If your sprint plan doesn't budget that, your sprint plan is wrong, and the question is whether you find out before launch or after.
GTM: the launch sequence
I had decided the GTM channels at the start of the sprint and queued the assets during the build. The launch sequence over the first ten days was:
Day 1 (post-launch). Reddit. Two posts: one in r/SaaS framed as a build-log writeup with the product mentioned at the bottom, one in r/SideProject framed as a launch with the credit-pack details up top. Both posts followed the subreddit rules; the build-log one outperformed the launch one by 4×.
Day 2. Show HN on Hacker News. Title was the product positioning sentence, no marketing language. Comment thread was answered live for the first six hours. The post climbed to the front page for about three hours and produced the largest single inbound spike of the launch window.
Day 3. Twitter / X thread. Five posts walking through the OpenClaw build process, with the launch tweet at the end. Posted in my time zone's morning. Engagement was higher on the build-process posts than on the launch post; the funnel was the right shape.
Day 4. LinkedIn long-form post, focused on the pricing-model decision rather than the product. Targeted at SaaS founders and growth operators. Lower volume than Twitter, higher conversion to free trial.
Day 5–6. Indie Hackers post, focused on the one-sprint-build angle, with a request for feedback on the pricing model. The IH community responded with detailed pricing critiques, two of which I implemented in the next week.
Day 7. Product Hunt launch. Wednesday launch. Top 5 of the day. PH traffic was high-volume and low-conversion, as expected.
Day 8 onward. Paid acquisition began on Reddit ads, targeting the same subreddits where the organic posts had landed. The CPC was higher than I'd budgeted (Reddit ads have gotten more expensive in 2026 than in 2024) but the click-to-trial conversion was strong because the audience was pre-qualified by the targeting.
The total budget for the launch window was small — under four figures of paid spend. The organic channels carried most of the conversion. The Reddit ads outperformed the LinkedIn ads by a factor of 3 on click-to-trial; the LinkedIn ads I paused after 72 hours.
A note on the Instantly + v0 cold-outreach stack we wrote about in December: I ran a parallel cold-outreach campaign against a list of 220 SaaS founders in the launch window, with a personalized v0 prototype attached for each prospect. That campaign produced 11 replies and 3 paid conversions inside the launch window. Cheap, effective, but very narrow on volume.
Numbers from the launch window
The first week of public availability ran 7 days from day 13 of the sprint to day 20. The numbers I'm reporting here are first-week numbers, not steady-state, and I'll mark the ones that need confirmation.
Free-trial signups in the first week: in the low three figures. Free-trial-to-paid conversion: in the high single digits to low double digits, percentage-wise. Paid users at end of week 1: a small two-digit number. Average credit pack purchased: the smaller of the two starter packs. First-week revenue: low four figures. Best-performing acquisition channel: Hacker News Show HN by signups, Reddit organic by paid conversion. Worst-performing channel: LinkedIn ads (paused day 4).
The shape is the right shape. Pricing-model conversion is in line with what we modeled in the pre-launch spec. Channel mix is roughly what we predicted, with HN over-indexing on signups and Reddit organic over-indexing on the buying-intent of the converted cohort. Reddit ads are working but expensive enough that I'm watching the LTV signal closely before scaling spend.
What I'd do differently
Budget the security audit as a phase, not a sweep. I keep saying this because it's the lesson I most want to broadcast. The audit found 4 blockers in 14 issues; without it, this is a different launch story. Two of 12 days, every sprint, from now on.
Write the marketing site copy on day 1, not day 12. The marketing site copy was the last thing built in the sprint and it shows. The headline was rewritten three times in the first 48 hours of public availability. If I'd written the copy at the start, the agent fleet would have built the dashboard against the copy's promises, instead of the other way around. Backwards.
Don't run two paid channels simultaneously in the launch window. Reddit ads and LinkedIn ads at the same time made the analytics noisier than they needed to be. I should have launched Reddit, run it for 5 days, then layered LinkedIn on top once Reddit's baseline was clear. Cleaner data, faster decisions.
Make the relevance prompt versionable from day 1. I edited the prompt 12 times in the first two weeks, which is the right amount of editing, but the prompt history was in git commits rather than in a versioned table the dashboard could surface. The next OpenClaw product that uses an LLM in the hot path will treat prompts as first-class versioned objects.
Don't post to Indie Hackers on the Friday of launch week. I posted on Friday and got a slow weekend response. Indie Hackers is a Tuesday or Wednesday community. I knew this and posted anyway because the sprint had run long. The next launch will respect the calendar.
The bigger story
Shadow Inbox isn't, by itself, a remarkable piece of software. It is a thin wedge of monitoring + classification, built against two public APIs, with a credit-based pricing model that's mostly off-the-shelf Stripe metering. A senior full-stack engineer in 2022 could have built it in three weeks of focused work.
The remarkable part is the operating model. One human scoped the product, defined the GTM, owned the pricing decision, and approved the commits. Four Claude-driven agent roles built the codebase, ran the security audit, and produced the marketing site. The sprint shipped on day 12 with paid users on day 13. There was no full-time engineer on the project; there was no contractor; there was a fleet of agents and a checklist.
This is the second OpenClaw product to ship with this operating model and it will not be the last. The pattern compresses the time from "scoped one-pager" to "paying users" to a few weeks for products of this complexity. It does not compress the time required for product judgment, security thinking, or pricing decisions. Those still happen at human speed. The compression is in the plumbing — the parts of building software that used to require a full engineer's attention and now require an operator's attention plus an agent fleet.
For other operators thinking about this pattern: the multiplier is real and the constraints are exactly the ones I described. Scope tightly. Audit seriously. Price intentionally. Let the agents handle the rest.
If you want to see the product itself, Shadow Inbox is live — free trial, no credit card required to start, credit-based pricing once you decide it's worth the spend.
Three more from the log.

Building Tradoki: an AI trading-education SaaS in 8 weeks
From idea to paying customers in eight weeks. The ai trading education saas build log — stack, decisions, mistakes, and the 300-user beta that paid for itself.
Oct 17, 2025 · 11 min
MiroFish vs OpenClaw Factory: simulation vs execution
Two open-source multi-agent systems, two different bets. MiroFish simulates societies; OpenClaw executes work. Here's what each is for, and why it matters.
Apr 18, 2026 · 11 min
OpenClaw Factory: 33 autonomous Claude agents, 6 months of production
A field report from running OpenClaw Factory — a 33-agent Claude architecture in production for 6 months. Multi-agent system production lessons.
Mar 05, 2026 · 7 min