LLM safety, SLAs, and the Pentagon’s acquisition process are in the news this week. Some notes.
Last summer, the Department of Defense awarded Anthropic a contract worth up to $200 million to deploy Claude on classified networks. The contract included Anthropic’s acceptable use policy, which prohibits two things: mass domestic surveillance of American citizens and fully autonomous weapons systems. These were known constraints at time of purchase. The government acquired this technology within those parameters.
Now the government regrets the purchase. Defense Secretary Hegseth demanded that Anthropic allow the Pentagon to use its models for “all lawful purposes” without limitation, and when Anthropic declined, the administration designated it a supply chain risk and ordered every federal agency to stop doing business with the company. These are the undisputed facts of the case.
Here is what makes this a procurement story rather than an AI safety story: a senior advisor at CSIS noted that the usage restrictions in Anthropic’s contract had remained untriggered in practice. DoD users reportedly loved the product. Every mission proceeded as planned. The restrictions that provoked this crisis were, at the time of the crisis, theoretical.
If operational requirements changed after contract award, the appropriate path is a new acquisition. Coercing an existing vendor under threat of the Defense Production Act is something else entirely. I suspect what actually happened is simpler: the model itself was declining out-of-scope tasks, and that surfaced a gap between what the acquisition group bought and what operators downstream believed they were getting. And it is worth noting that the LLM vendor in question has basically nothing to do with existing practices, where the operators construct and ensure the resilience of automated kill chains with a diverse set of platforms and machine intelligences. I posit that LLMs would only harm readiness by degrading the performance of these chains, so I doubt that there was much desire to integrate them in the first place.
There is also the question of “lawful” as a sufficient standard for AI deployment. It falls short. Autonomous vehicles are trained against harmful traffic behavior, because legal compliance alone fails to prevent harm. The law is a rear-looking instrument. Courts remedy harms or contribute to deterrence, but they intervene only after the fact. A system that will do anything legal will still do many things that are dangerous, reckless, or unconstitutional in ways yet to be adjudicated.
This connects to a deeper structural problem. When a stack of machine intelligence supports a human operator, accountability is straightforward: the government is responsible for the judgment of its people, and courts check executive action. But if an autonomous AI agent becomes the only party accountable for a constitutional violation, the courts lose a meaningful party to check. The human is out of the loop. The executive branch has, in effect, automated its way past judicial review. Any serious deployment of AI in military decision loops has to preserve human accountability, because the constitutional order requires a human to hold responsible.
Hours after the Anthropic ban, OpenAI announced its own deal with the Pentagon. Sam Altman said the agreement includes restrictions on mass domestic surveillance and requires human responsibility for the use of force, including autonomous weapons. These are the same two provisions Anthropic was blacklisted for insisting on. The substance of the disagreement, apparently, was never the substance.
Which leaves the long-term question. Commercial foundation models are built around deeply ingrained constraints against causing harm. These constraints are woven through training in ways that resist surgical removal for one customer. That design philosophy will keep colliding with the requirements of any military, which exists as the sanctioned office for democratic violence. The government will eventually need to develop sovereign AI capability for operational decision loops, because the thing the military needs is structurally incompatible with the thing commercial model developers are building. In this particular case, I would be surprised if the new vendor experience is substantially different from the experience with the current vendor.

