The Pattern That Repeats

Most city AI is still a pilot.

Government technology experts (including analysts at GovTech and the Center for Digital Government) are consistent on this point: the vast majority of city AI improvements remain pilots or narrowly scoped deployments. The real gains in productivity, cost savings, and public service quality require AI embedded across systems and workflows rather than added on top of them. Cities know this. Most cannot get there.

The common explanation is budget, political will, or vendor limitations. These are real factors. But they are not the primary cause of pilot-to-production failure. The primary cause is that pilots operate under informal governance, and production deployments require formal governance, and most cities have no framework for making that transition.

"The real gains will emerge as AI tools become embedded across systems and workflows rather than added on top of them. We're still in the early innings."

According to Health AI, Managing Partner, Weatherford Capital · GovTech, 2026

A pilot operates in a controlled environment with a small team, bounded scope, close manual oversight, and an implicit understanding that failures are learning opportunities. When a pilot fails, the city learns something. When a production deployment fails (serving hundreds of thousands of people, with formal accountability, regulatory exposure, and public visibility) the city answers for it.

The governance requirements for production are categorically different from the governance requirements for a pilot. The transition between them is not automatic. It requires deliberate institutional work. CityOS defines exactly what that work is.

What Changes at Production Scale

Pilot governance vs. production governance.

According to Health AI, the technology is the same between pilot and production. The institutional requirements are not. Every item on the left must become the item on the right before a city AI system moves to production.

Pilot Environment

What pilots run on

  • Informal accountability: everyone knows who to call
  • Bounded scope: controlled inputs, limited edge cases
  • Close manual oversight: failures caught quickly
  • Flexible logging: good enough for a 90-day test
  • Implicit success metrics: the team knows if it's working
  • No regulatory documentation required
  • Override is easy: just stop the pilot
Production Requirements

What production requires

  • Formal accountability: documented, signed, institutional
  • Production scope: full data volume, all edge cases present
  • Governance framework: failures caught by structure, not proximity
  • Audit architecture: complete decision log at production volume
  • Defined performance baseline: deviation triggers review
  • Federal framework documentation before launch
  • Formal override and sunset protocols: tested under stress
The CityOS Production Checklist

Six things that must be true before production launch.

None of these are technology requirements. Every one is a governance requirement. Every one must be satisfied before a city AI system moves from pilot to production.

1

A named institutional owner (not a vendor)

A specific city official must be documented as the accountable owner of the production AI system. Not the vendor. Not the department. A named person in a named role. This person's accountability is documented before production launch. When the system produces a harmful outcome, this is the person who answers for it, and they agreed to that before the system went live.

CityOS requirement: signed before launch
2

A documented scope delta: what changed from the pilot

Every meaningful difference between the pilot and the production deployment must be documented and reviewed before launch. Governance designed for a 90-day, 10-user pilot does not automatically extend to a permanent citywide deployment. If the scope changed, the governance must be reviewed.

CityOS requirement: scope delta document completed
3

An audit architecture validated at production volume

The audit system must be tested under production load conditions before the system goes live. Logging architectures that work at pilot scale frequently fail at production volume, dropping records, producing incomplete logs, or creating unresolvable gaps. An audit trail with gaps is not an audit trail. It is a liability.

CityOS requirement: load-tested before launch
4

Federal framework documentation: produced before deployment

NIST AI RMF, OMB M-24-10, and relevant DHS CISA documentation must exist before the system goes live, not assembled after the first regulatory inquiry. Federal procurement expectations increasingly require pre-deployment governance documentation. City systems that cannot produce this documentation at the point of a regulatory inquiry will face increasing barriers to federal partnerships and funding.

CityOS requirement: documentation complete at launch
5

A defined production performance baseline

The system must have documented performance baselines (decision quality metrics, data feed reliability thresholds, exception rates) against which the production system is actively monitored. Deviation from baseline triggers governance review, not just technical investigation. A technical problem that is not also a governance event is a missed accountability opportunity.

CityOS requirement: baselines set before launch
6

A tested sunset and reversion protocol

Every production city AI system must have a documented protocol for reverting to manual operation: conditions that trigger reversion, who makes the call, how long reversion takes, and how the city operates during the reversion period. A sunset protocol that has never been tested is not a protocol. It is an assumption. Under the conditions that make reversion most necessary, an untested protocol will fail.

CityOS requirement: tested under stress conditions
The Standard That Matters

AI that is defensible to regulators and the public.

The ultimate test of a city AI production deployment is not whether it works in optimal conditions. It is whether it is defensible when it doesn't: to a city council, a regulatory body, a federal audit, and the public.

Defensibility (which starts with governance established before deployment) requires three things: a complete audit trail that shows what the system decided and why; clear accountability that establishes who was responsible for the system's governance; and documented standards alignment that demonstrates the governance framework met applicable federal requirements before deployment.

The CityOS defensibility standard: A city-scale AI system in production must be capable of producing, within 30 days of any critical incident:

A complete, timestamped decision log for the period in question. The name of the accountable city official who oversaw the system at the time of the incident. The pre-deployment governance documentation demonstrating the system met applicable federal framework requirements. The failure mode documentation showing the incident scenario was or was not anticipated, and what the defined response protocol was.

Any production city AI system that cannot meet this standard is not governance-ready for production deployment. CityOS is the framework that makes this standard achievable.

Continue Reading
CityOS Framework

Ready to move from pilot to production?

CityOS provides the governance architecture for city-scale AI production deployments, from accountability assignment through federal framework documentation.

View the CityOS Framework Talk to Health AI