Specs existed to prevent bad ideas from shipping. When shipping takes an afternoon, the spec is just theater. Build the thing, then defend it.
"Most PMs were never actually bottlenecked by execution. They were bottlenecked by taste and judgment. Team capacity functioned as a governor that prevented bad ideas from shipping. Remove that governor and you discover who was driving and who was just steering."
We hire operators, not curators. They ship production code in Cursor or Claude Code. They write their own eval suites in Braintrust before the feature reaches a user. They read a LangSmith trace without asking for help, and they've already tried the agent-orchestration library that launched last Tuesday. They don't wait for an engineer and they don't draft a PRD — they prototype the direction themselves and let the team critique the ship, not the idea. Above all, they have taste: the judgment to know what's worth shipping when capacity is infinite.
The PM opens Claude Code and ships a rough v2 of the rebooking flow by lunch. No ticket, no PRD. The working thing goes into a side branch the team can poke at that afternoon.
Writes 20 evals in Braintrust against last week's failure logs. The eval suite IS the spec. When a case reds out, the failure mode becomes the next day's fix — no meeting required.
Ships the experiment to 10% of traffic behind a flag. Slack note has two screenshots, an eval-score delta, and a timestamp. That replaces the review deck.
Reads LangSmith traces end-to-end. Kills one branch of the experiment at 11am because the eval delta is clearly negative. Communicates the kill in one paragraph, no ceremony.
Three calls with travelers who hit the failure mode on Tuesday. Loom of the three calls posted to channel. Two new evals written that afternoon from what they said.
Not "work less." Work on direction, not on the rituals that used to simulate direction.
PRDs existed to preserve vision across a multi-week handoff. The handoff doesn't exist anymore. Evidence — the working thing plus its eval suite — carries the direction better than any document could.
Capacity is infinite; taste is the bottleneck. A PM who files a ticket and waits three weeks for a Figma review is a PM who still thinks execution is the scarce resource. She prototypes the direction herself and lets engineering critique the ship, not the idea.
Status meetings, alignment decks, quarterly reviews — none of it moves the product. One Loom showing the working ship plus eval scores replaces four one-hour meetings. The hours saved go to taste.
PM → Figma → ticket → eng is four relays. Each one dilutes the vision by 20%. She builds directly in the codebase so the direction arrives intact. If that feels strange, the strangeness is the point.
Sprints were a governor on bad ideas — force them through a two-week window so fewer of them reach production. Taste is the governor now. If the eval suite holds, it ships when it's ready; not when the ritual says.
Each of these was a workaround for a bottleneck that no longer exists. Their absence is the job.
She doesn't "manage AI features." She delegates to agents, defines failure modes, owns the evals, and reads the traces. One real day, timestamped.
No human in the loop. Flag on, 5% of traffic.
Spots a regression on the "traveler paying in a second currency" case. Delta: –6 points. Not catastrophic, not acceptable.
Each eval is a concrete traveler scenario. The failure is now a reproducible test, not a Slack thread.
Three lines of prompt and one guardrail update. Deployed behind the same flag.
Eval delta now +2 points. Safe to widen.
Writes a two-sentence Loom to the team. Done for the day. One experiment tested, one failure mode added to the permanent eval set.
A team running this cycle runs 13× the experiments per quarter of a team that still writes PRDs and waits for eng. After four quarters the learning gap is insurmountable. The role isn't "PM who uses AI" — it's "PM who is the team's iteration loop."