"Loop engineering" is having a moment.
Boris Cherny (creator of Claude Code) and Peter Steinberger (creator of OpenClaw) put the phrase into the air a few weeks ago. Andrew Ng followed with a clean framing of three nested loops that AI-native product builders run in parallel — agentic coding (minutes), developer feedback (hours), and external feedback (days to weeks).
Ng's framing is the best public articulation I've seen of how good AI-native teams actually work. If you build software with coding agents and haven't internalised it yet, read his post first.
But — and this is what I want to add — for the enterprises AI Guru works with, there's a fourth loop. Most teams have set up loops 1 and 2. Some have loop 3. Almost none have loop 4 in any disciplined way. That gap is how organisations end up with “vibes coding” in production: an agent shipped something, a developer reviewed it, a few users tried it, and nobody has any idea whether it meets the compliance, security, and audit posture the company has committed to its customers and regulators.
The three loops (Andrew Ng's framing, briefly)
For readers who haven't seen the original, here's the gist — but go read it, it's worth the 5 minutes.
Loop 1 — Agentic coding (minutes). A coding agent writes code, tests it, runs it, evaluates against the spec or a set of evals, and iterates. The loop tightens with better evals. This is what makes “Claude Code can work for an hour without your intervention” actually true.
Loop 2 — Developer feedback (tens of minutes to hours). A human reviews what the agent built, decides what to keep, refines the spec, and re-runs the agent. As agents test their own code more competently, this loop moves up-stack from QA to product decisions: features, UI, user flow.
Loop 3 — External feedback (hours to weeks). Show it to a friend. Ship it to alpha users. A/B test it. Analyse usage data. The signal informs the product vision, which feeds back into the spec, which feeds back into Loop 1.
These three loops nest. Loop 1 runs many times per Loop 2 tick. Loop 2 runs many times per Loop 3 tick. End-to-end product velocity is gated by the slowest loop you actually run.
Why the three loops aren't enough in enterprise
For Ng's running example — a typing app he built for his daughter — three loops are plenty. The downside risk of a bug in a personal project is “the daughter notices and tells dad.”
For a bank deploying an AI workflow that touches customer transactions, a manufacturer using AI to assist quality inspection on safety-critical components, or a hospital using AI to draft discharge summaries — the downside risk of a bug or a missed compliance check is qualitatively different. It is a regulator notice. It is a patient safety event. It is a board-level conversation.
Three loops without a fourth is shipping without a parachute.
Loop 4 — Enterprise Governance (weeks → quarters)
The fourth loop runs slower than the other three by an order of magnitude or more. It is the cycle in which an organisation:
- Reviews what AI it has actually deployed
- Verifies that deployments comply with applicable regulation (DPDP Act, RBI guidelines, ISO 42001, HIPAA, NIST AI RMF, EU AI Act, sector-specific rules)
- Audits the data being sent to AI systems and the data being produced
- Updates internal policies — acceptable use, vendor evaluation, model selection, escalation
- Reports to leadership and the board on AI risk posture
- Closes the loop by feeding governance findings back into specs, evals, and code
Done well, Loop 4 is the thing that lets you go fast on Loops 1–3 without blowing up. It is not a bureaucratic add-on. It is the loop that tells your CEO, your board, your auditor, and your regulator that you know what your AI is doing.
Done badly — or skipped entirely — Loop 4 shows up as the surprise notice from a regulator six months after a deployment. By then, the agent has produced a hundred thousand outputs, none of them auditable, and nobody on the team can answer the simple question: what AI did we ship, where, with what evals, governed by which policy?
Why most orgs skip it
In our work across BFSI, manufacturing, healthcare, and IT services, the same pattern shows up:
- Loop 1 is exciting. Developers love Claude Code, Cursor, and Copilot. They adopt fast.
- Loop 2 follows naturally. Once you're using an agent, you steer it. Most engineers figure this out within a week.
- Loop 3 is uncomfortable but visible. Product teams already know about user feedback; AI doesn't change the principle, only the velocity.
- Loop 4 is invisible until it isn't. Nobody on the engineering team has it in their job description. Nobody on the compliance team understands the AI workflows well enough to govern them. So it doesn't happen — until something breaks, and then it happens all at once, and badly.
The fix is not “more compliance reviews.” It is to build Loop 4 as an actual loop — a cycle with a cadence, owners, artefacts, and feedback into the other loops — before you need it.
What Loop 4 looks like, operationalised
A working enterprise governance loop has, at minimum:
- An AI inventory. What systems are using AI, in what workflow, against what data, with what models?
- A risk register. What can go wrong, what's the exposure, what's the mitigation, who owns it?
- A policy stack. Acceptable use, data handling, vendor evaluation, model selection, human-in-the-loop rules.
- An evals discipline. Production AI systems have evals that run on a schedule, not just during build.
- An audit trail. Who shipped what, when, governed by which policy version, reviewed by whom.
- A board cadence. Quarterly AI posture review for the board, mapped to fiduciary duties.
- A feedback path. Findings from any of the above flow back into Loops 1–3.
If you have an AI Guru-style governance pillar — board advisory, NEUBoard, AgentGuru, AssuranceOps — you already have most of the tools. If you don't, you can stand up the minimum version of all of the above in 6–10 weeks. The deeper your AI usage, the sooner this matters.
A diagnostic for your org
Score honestly. Most orgs we walk in to score 1–2.
| Loop | Question | Score |
| 1 | Do your developers run an agentic coding loop with evals (not just re-prompts)? | 0 / 1 |
| 2 | Do your developers play a product-shaping role, not just a QA role? | 0 / 1 |
| 3 | Do you have a real external feedback loop — alpha users, A/B tests, usage telemetry — that feeds back into specs? | 0 / 1 |
| 4 | Do you have an inventory, risk register, policy stack, eval discipline, audit trail, and board cadence for your deployed AI? | 0 / 1 |
A score of 4/4 is rare and impressive. 2/4 is the modal enterprise. 1/4 is more common than you'd think. 0/4 happens — particularly in mid-market companies whose developers have started using AI coding tools without leadership knowing yet.
If you're a leader and your score is below 4, the missing loops are your highest-leverage investments for the next quarter.
What this means for AI-native engineering teams
Ng makes a sharp observation in his post: engineers are increasingly playing partial product management roles, because AI has lowered the cost of building enough that vision and feedback now bottleneck velocity more than implementation does.
I'd add: engineers in enterprise contexts are also being asked to play partial governance roles, for exactly the same reason. The compliance team can no longer keep pace with what the engineering team ships. So engineers need to internalise the governance loop, just as they've been internalising the product loop.
This is not the same as “make every engineer a lawyer.” It is “give every engineer a working mental model of what governance needs from their work, and the artefacts to satisfy it without a quarterly fire drill.”
That mental model starts with the four loops.
If you lead a team building with coding agents in an enterprise context, AI Guru runs Executive AI Briefings for leadership teams — a private 90–120 minute session on practical AI adoption with governance, risk, and the 90-day roadmap baked in. We also teach Loop 4 explicitly across our enterprise training engagements.
Subscribe to The Forward View for weekly notes on enterprise AI, written for decision-makers and read by 3,600+ leaders worldwide.



