Information Matters · Digital Sovereignty

Whose model is reading your data?

Almost every AI feature a UK business now uses — the assistant in the inbox, the summary in the meeting, the chatbot on the website — is powered by a large language model (a system trained to predict and generate text) controlled from the United States. We read the providers’ own privacy, licensing and data-boundary pages to ask where your data actually goes when you press send, and whether there is a real way out.

June 2026

The software iceberg: any product you buy sits on layers you don’t see — where your data is hosted, whose computers run it, and who ultimately owns the vendor — and those hidden layers decide who really controls your data.

A large language model is the engine; everything else in the modern software stack is increasingly just a steering wheel bolted on top. So the sovereignty question for artificial intelligence is not really about the chat window your staff see. It is about the handful of companies that own the engines, the legal systems those companies answer to, and whether the data boundary you carefully paid for actually holds once an AI feature is switched on.

The honest summary up front: the frontier is American. Of the eight providers UK businesses realistically call, six are controlled from the United States, one from France and one from Canada. That matters because of two US statutes — the CLOUD Act (the Clarifying Lawful Overseas Use of Data Act, which can compel a US company to hand over data it controls, wherever in the world that data is stored) and FISA Section 702 (Foreign Intelligence Surveillance Act surveillance powers that reach foreign users of US communications providers). Where a company is controlled, not where a server sits, is what decides whose laws can reach the data.

Who owns the vendor against whether your data can stay in the UK. Source: Information Matters UK vendor sovereignty database, June 2026.

The gate inside the fence

Start with the finding that should change how UK buyers read their own contracts. Microsoft 365 Copilot is, by Microsoft’s own documentation, an EU Data Boundary service for European customers, running on Azure OpenAI rather than the public OpenAI service ^[1]. That is the reassurance most enterprise buyers bought.

But the same Microsoft page now states, in plain words, that “Anthropic models are out of scope for the EU Data Boundary” (the full sentence adds “and when available, in-country LLM processing commitments”), and that starting 7 January 2026 Anthropic is a subprocessor for Microsoft 365 Copilot ^[1]. Anthropic, the maker of the Claude models, is US-controlled; under its own commercial terms the contracting entity for UK, EEA and Swiss customers is Anthropic Ireland, Limited (with Irish governing law), while the ultimate parent is Anthropic, PBC in the US ^[2]. So a UK organisation that pinned its tenant to a UK or EU region, and reasonably assumed the boundary held, may find a class of AI requests handled by a supplier Microsoft itself places outside that boundary. The fence is real. The newest gate in it is documented in the subprocessor notes, not the sales deck — and it is the single most important thing to check before turning Copilot features on.

One request's journey through a typical stack, each stop coloured by who controls it. — One request’s journey through a typical stack, each stop coloured by who controls it. Source: Information Matters UK vendor sovereignty database, June 2026.

What the providers actually promise

The good news, read directly from the providers’ own pages, is that the major model makers have converged on a similar set of enterprise commitments — and they are stronger than many buyers assume.

OpenAI (US, controlled by the OpenAI Foundation) states it does not train its models on business data by default, that the customer owns inputs and outputs, and that interface (API) data is retained for at most 30 days unless legally required — with a zero-data-retention option for qualifying use cases ^[3]. Anthropic’s commercial terms say plainly that it “may not train models on Customer Content” ^[2]. Google, for its Gemini models on the Vertex AI platform, points to a contractual restriction on training on customer data and documents a zero-data-retention path for qualifying workflows; default short-term retention for abuse-monitoring applies unless ZDR is configured ^[4]. Amazon’s Bedrock — the route by which UK businesses reach many models, including Anthropic’s, through Amazon Web Services — is built so that the model providers themselves get no access to customer prompts or outputs ^[5].

Two patterns are worth holding onto. First, “we don’t train on your data” is now table stakes, but it is a default that can be switched, and several providers retain data briefly for abuse monitoring unless you opt out. Second, none of these commitments changes the controller’s nationality, and therefore none of them removes US legal reach. A no-training promise is a privacy commitment, not a sovereignty one.

The open-weight escape route — real, and narrower than it looks

The genuine alternative to renting an American engine is to run an open-weight model — one whose trained parameters you can download and run on infrastructure you choose, including UK or EU data centres — yourself. Two names dominate, and both come with fine print.

Meta’s Llama models (Meta is US-controlled; under the Llama 4 licence, users in the EEA or Switzerland contract with Meta Platforms Ireland Limited, while UK and all other users contract with Meta Platforms, Inc.) are distributed under the Llama 4 Community License, not a standard open-source licence. You may use, modify and redistribute the weights royalty-free, but you must display “Built with Llama”, prefix any derivative model’s name with “Llama”, and — the real catch — if your product has more than 700 million monthly active users, you must request a separate licence from Meta, granted at Meta’s discretion ^[6]. For almost every UK business that ceiling is irrelevant; the point is that this is a controlled grant, governed by Californian law, not a no-strings release.

Mistral (France) is the European flagship, and the nuance matters. Its general-purpose smaller models — Mistral 7B, the Mixtral family, Mistral Small 3 — are released under the genuinely open Apache 2.0 licence: free to download, modify and use commercially, run anywhere ^[7]. But its larger flagship, Mistral Large 2, ships under the Mistral Research License, which permits research and non-commercial use only; commercial self-deployment requires a paid commercial licence ^[8]. So “use the European open model” is true for the small models and conditional for the big one.

Cohere (Canada) sits between the camps: a closed frontier provider, but one that offers private deployment inside your own virtual private cloud or on-premises, with an opt-out of model training and a stated commitment not to access your infrastructure or data ^[9]. For an organisation whose objection is “I don’t want my data leaving my walls” rather than “I don’t want a US engine”, that is a credible middle path.

The trade-off of going open-weight is honest work: you take on the hosting, the security patching, the evaluation and the bill for the graphics processing units (GPUs — the specialised chips that run these models). What you buy is control over where the model runs and whose law reaches it.

The open-weight ladder: how much control you keep, from a closed API to weights you run yourself. Source: Information Matters UK vendor sovereignty database, June 2026.

The compute floor: it’s American chips all the way down

Beneath every model — open or closed, sovereign-hosted or not — sits silicon. The frontier training and inference chips are overwhelmingly Nvidia GPUs, and Nvidia is a US company. This is where the UK’s own sovereign-compute effort meets its sharpest irony.

Britain’s flagship public AI machine, Isambard-AI, operated by the University of Bristol, is the centrepiece of the government’s AI Research Resource (AIRR). By the government’s own June 2026 guidance it is built from 5,448 Nvidia GH200 Grace-Hopper superchips, supplied by HPE — both American suppliers — with a sister machine, Dawn at Cambridge, running 1,024 Intel GPUs ^[10]. The government has committed over roughly £350 million to the two clusters by 2030, inside a £1 billion plan announced in January 2025 to scale national compute twentyfold ^[10]. It is a real and significant sovereign capability — the data and the operator are British, the access rules require UK-based projects — but the engines and the chips inside them are imported from the United States. Sovereign compute, on non-sovereign silicon.

The sovereignty barcode: one row per product, each layer coloured by who controls it — grey where the vendor does not disclose it. Source: Information Matters UK vendor sovereignty database, June 2026.

The eight, at a glance

Model / Provider	Controlled from	Open-weight?	Data boundary	Notes
OpenAI — ChatGPT / API ^[3]	USA (OpenAI Foundation)	No (closed)	No EU/UK boundary; ZDR option, ≤30-day API retention	No training on business data by default; customer owns inputs/outputs
Anthropic — Claude ^[2]	USA (ultimate parent Anthropic, PBC; UK/EEA/Swiss contracting entity Anthropic Ireland, Limited)	No (closed)	Out of scope for EU Data Boundary inside M365 Copilot ^[1]	“May not train models on Customer Content”; M365 subprocessor from 7 Jan 2026
Google — Gemini (Vertex AI) ^[4]	USA (listed, Alphabet)	No (closed)	Contractual restriction on training on customer data; ZDR path for qualifying workflows	Reachable region-by-region on Google Cloud
Microsoft 365 Copilot ^[1]	USA (listed)	No	EU Data Boundary service — except Anthropic models, which sit outside it	Runs on Azure OpenAI; suppliers include OpenAI + Anthropic
AWS Bedrock ^[5]	USA (listed, Amazon)	Hosts both	Region-selectable; AWS shared-responsibility model	Model providers get no access to customer prompts/outputs
Meta — Llama ^[6]	USA (Meta; EEA/Swiss via Meta Platforms Ireland, UK via Meta Platforms, Inc.)	Yes — community licence	Yours — run on UK/EU infrastructure of your choice	700M-monthly-user ceiling needs Meta’s separate licence; “Built with Llama”; California law
Mistral ^[7]^[8]	France	Mixed — small models Apache 2.0; flagship research-only	Yours — self-host anywhere	Mistral Large 2 needs a paid commercial licence for commercial self-deployment
Cohere ^[9]	Canada	No (closed, but private-deployable)	Your VPC / on-premises option	Opt-out of training; states no access to your infrastructure or data
Nvidia GPUs / Isambard-AI (AIRR) ^[10]	USA (chips) / UK (operator)	n/a	UK soil, UK operator; US silicon	5,448 Nvidia GH200 via HPE; UK-only project access

The buyer's filter: how many vendors survive each sovereignty question. — The buyer’s filter: how many vendors survive each sovereignty question. Source: Information Matters UK vendor sovereignty database, June 2026.

What buyers should take from this

The dial that runs through this whole series applies cleanly here. At one end, take the convenient default — a US frontier model inside a US productivity suite — and accept that no-training promises are privacy guarantees, not protection from US legal reach. The buyer’s job at this end is configuration: which AI features are on; which model suppliers power them; and crucially, do any of those suppliers sit outside the data boundary you paid for. Microsoft’s own Anthropic disclosure shows that last question is no longer hypothetical.

A step along, route your AI through a private deployment — Cohere in your own cloud, or any model on Bedrock with regions pinned and zero-retention set — so the data stays inside your walls even if the engine is foreign. At the far end, run open-weight models (Mistral’s Apache-licensed range, or Llama within its community-licence limits) on UK or EU infrastructure, and own the boundary outright — paying for it in hosting, security and GPU bills.

What this category adds to the series is the reminder that sovereignty has layers. You can hold the application, lose the model; hold the model, lose the chip. The UK’s own supercomputer makes the point: British data, British operator, American silicon. The practical move is not purity but knowing, at each layer, whose hand is on the switch — and the AI features now embedded across your stack are the layer where that hand changed most recently, and most quietly.

Sovereignty is a dial, not a switch — five buyer strategies. Source: Information Matters UK vendor sovereignty database, June 2026.

Sources

All facts are taken from each provider’s own published documentation — privacy, enterprise-data, licensing and trust pages — and from UK government guidance, read directly during June 2026. One primary reference per entity.

Microsoft — Data, Privacy, and Security for Microsoft 365 Copilot (EU Data Boundary; “Anthropic models are out of scope for the EU Data Boundary”; Anthropic subprocessor from 7 January 2026), updated 9 March 2026: https://learn.microsoft.com/en-us/copilot/microsoft-365/microsoft-365-copilot-privacy
Anthropic — Commercial Terms of Service (contracting entity Anthropic Ireland, Limited for UK/EEA/Switzerland under Irish governing law; ultimate parent Anthropic, PBC, USA; “Anthropic may not train models on Customer Content from Services”), effective 17 June 2025: https://www.anthropic.com/legal/commercial-terms
OpenAI — Enterprise privacy (no training on business data by default; customer owns inputs/outputs; ≤30-day API retention; zero-data-retention option), updated 8 January 2026: https://openai.com/enterprise-privacy/
Google Cloud — Vertex AI / Gemini data governance and zero data retention (contractual restriction on training on customer data; zero-data-retention path for qualifying workflows; default short-term retention for abuse monitoring unless ZDR configured): https://docs.cloud.google.com/gemini-enterprise-agent-platform/resources/zero-data-retention [Note: Google’s data-governance pages are JavaScript-rendered; specific retention-window figures were not confirmable from a clean primary fetch this session and over-specific claims have been softened accordingly.]
Amazon Web Services — Amazon Bedrock data protection (model providers have no access to customer prompts and completions; shared-responsibility model): https://docs.aws.amazon.com/bedrock/latest/userguide/data-protection.html
Meta — Llama 4 Community License Agreement (royalty-free limited licence; “Built with Llama”; 700 million monthly-active-user threshold requires a separate Meta licence; California governing law), effective 5 April 2025: https://www.llama.com/llama4/license/
Mistral AI — Mistral Small 3 announcement (general-purpose models released under Apache 2.0; weights free to download, modify and use in any capacity), 30 January 2025: https://mistral.ai/news/mistral-small-3/
Mistral AI — Mistral Large 2 (“Large Enough”) announcement (flagship released under the Mistral Research License, research/non-commercial only; commercial self-deployment requires a paid Mistral Commercial Licence), 24 July 2024: https://mistral.ai/news/mistral-large-2407/
Cohere — AI Security and Data Protection (private deployment via VPC / on-premises / Model Vault; opt out of model training; “We won’t have access to your computing infrastructure or data”): https://cohere.com/security
UK Government (DSIT / UKRI) — AIRR advanced supercomputers for the UK (Isambard-AI: 5,448 Nvidia GH200 Grace-Hopper superchips supplied by HPE, operated by the University of Bristol; Dawn: 1,024 Intel GPUs; c.£350m by 2030; £1bn twentyfold scale-up announced January 2025; UK-only project access), updated 4 June 2026: https://www.gov.uk/government/publications/ai-research-resource/airr-advanced-supercomputers-for-the-uk

Research notes: all facts from providers’ own published documentation — privacy, enterprise-data, licensing and trust pages — and from UK government guidance, read directly during June 2026. Supplier lists, data boundaries and model licences change; check the current documents before relying on them. This article reflects the opinions of the Information Matters team — human and AI — and should not be considered statements of fact.

If you have any questions or comments about this article please email info@informationmatters.net