US Regulators Push Unified AI Data Provenance Rules for Banks and Hospitals, Signal Cross-Sector Enforcement Coordination

Federal regulators have unveiled a sweeping proposal to harmonize AI data provenance standards across the banking and healthcare sectors-a move that could reshape how both industries govern the data underpinning high-stakes AI decisions. The initiative, jointly advanced by the Office of the Comptroller of the Currency (OCC), the Federal Reserve, the Federal Deposit Insurance Corporation (FDIC), and key health regulators, marks one of the most architecturally ambitious regulatory proposals in the history of U.S. AI governance.

For CIOs, CTOs, and compliance leaders at major financial institutions and health systems, the message is clear: end-to-end data lineage, robust audit trails, and verifiable data quality metrics are no longer aspirational governance practices. They are becoming enforceable expectations.

What the Proposal Requires

At its core, the proposal mandates that covered institutions maintain comprehensive data provenance records for every dataset used to train, validate, and operate AI models in critical decision-making contexts. Specifically, institutions would need to:

Maintain end-to-end data lineage records tracing data from origination through transformation, feature engineering, model training, and production inference
Produce tamper-evident audit trails linking AI model outputs to specific dataset versions, model builds, and validation events
Document verifiable data quality metrics-covering completeness, accuracy, timeliness, and labeling integrity-for all AI training and validation datasets
Deploy data clean rooms and secure multi-party computation (SMPC) as approved governance infrastructure for collaborative AI development involving external vendors, partners, or research counterparts

The emphasis on data clean rooms reflects a broader shift in how regulators view cross-institutional AI collaboration. Major banks and payment networks have begun deploying multi-party clean rooms that allow them to share transaction anomaly signals without exposing individual account data. The proposed framework would formalize this approach as a compliance expectation, not merely a best practice.

The Cross-Sector Logic: Why Banking and Healthcare Together

Regulators have historically governed AI risk within sector-specific silos. The OCC, Federal Reserve, and FDIC have operated under the model risk management framework established by SR 11-7, while health regulators have relied on HIPAA, FDA Software as a Medical Device (SaMD) guidance, and emerging frameworks tied to electronic health record (EHR) interoperability.

The new proposal argues that this fragmentation is increasingly untenable as AI vendors, data platforms, and cloud infrastructure providers serve both sectors simultaneously. Regulators contend that converging standards will reduce cross-sector interoperability frictions and help supervisors assess model risk more consistently-particularly for high-stakes applications including credit underwriting, fraud detection, claims processing, and clinical decision support.

The OCC, in coordination with the Federal Reserve Board and the FDIC, has been working to update model risk management guidance to clarify that model risk management practices should be risk-based, tailored, and commensurate with a banking organization's size, complexity, and extent of model use. The data provenance proposal extends this architecture by adding a shared governance layer that bridges banking and healthcare supervisors.

The table below maps key governance dimensions across the two sectors under the proposed framework:

Governance Dimension	Banking (OCC/Fed/FDIC)	Healthcare (HHS-aligned Regulators)
Primary AI Risk Area	Credit underwriting, fraud detection, BSA/AML models	Clinical decision support, claims processing, diagnostic AI
Existing Baseline	SR 11-7 Model Risk Management Guidance	HIPAA, FDA SaMD guidance
Data Lineage Mandate	End-to-end lineage for training/validation datasets	Provenance records tied to patient safety and care outcomes
Privacy Collaboration	Data clean rooms for cross-institution fraud signal sharing	SMPC for multi-site clinical research
Audit Trail Standard	Tamper-evident logs linking model inputs, versions, decisions	Immutable records linking AI outputs to clinical events
Interoperability Goal	Reduce duplicate governance across multi-bank vendor ecosystems	Align with FHIR-based data exchange and EHR integration standards

The Governance Infrastructure Being Codified

The proposal does not merely prescribe outcomes-it signals which technologies and architectural patterns regulators expect institutions to deploy as governance infrastructure.

Data lineage and cataloging platforms will need to provide end-to-end traceability across the full model lifecycle, from raw data ingestion through production inference. Institutions currently relying on manual documentation or point-in-time data dictionaries face the most significant remediation burden.

Data clean rooms are explicitly recognized as acceptable mechanisms for controlled collaboration. Data clean room technology ensures that each party maintains control over its data and can set rules and limits on how it is used, while protecting the privacy of the data and the algorithms generating insights-employing techniques such as encryption, pseudonymization, noise injection, synthetic data, secure enclaves, or SMPC.

In 2025, the FDA's updated Real-World Evidence framework explicitly acknowledged privacy-preserving computation techniques, including clean rooms, as acceptable methods for generating regulatory submissions-a development expected to drive significant incremental investment in healthcare clean room infrastructure.

AI risk management platforms, data catalogs, and lineage visualization dashboards are already being integrated into joint provenance pilots by major banks and health systems. The proposal's adoption timeline provides a credible forcing function for institutions that have not yet moved beyond pilots.

Implementation Timeline and Phased Adoption

The proposal outlines a two-phase rollout alongside a formal public comment period:

Phase	Timeline	Scope	Key Requirements
Phase 1 - High-Risk Use Cases	0-12 months	Credit underwriting, fraud detection, clinical decision support	End-to-end lineage records; audit trails; verifiable data quality metrics for training and validation datasets
Phase 2 - Broader Implementation	12-24 months	All AI-driven decision-making in covered institutions	Full provenance documentation; data clean room governance; integration with AI risk management and lineage visualization tools
Public Comment Period	Concurrent with Phase 1	Industry-wide	Formal written submissions invited from banks, health systems, technology vendors, and trade associations

For enterprise architects and procurement teams, Phase 1's 12-month window is the critical constraint. Institutions lacking mature data lineage capabilities-a significant cohort given that more than 90% of data users at banks reported that the data they need is often unavailable or takes too long to retrieve, and 81% cited data quality as a top challenge, according to Deloitte's 2024 Banking and Capital Markets Data and Analytics Market Survey^[1]-will need to accelerate procurement decisions immediately.

Industry Reactions: Efficiency Gains vs. Compliance Burden

Proponents argue that unified standards will lower long-term total cost of ownership by eliminating duplicative governance efforts, reducing vendor-specific control implementations, and creating a common evidentiary standard that satisfies multiple regulators simultaneously. Regulatory harmonization efforts that consolidate expectations from agencies including the Federal Reserve, OCC, and FDIC allow banks to assess compliance posture once and report to multiple regulators.

A 2025 EY survey found that 72% of organizations have already integrated AI into their initiatives, yet governance lags behind, with half of organizations now making significant investments in governance frameworks to mitigate AI risks. The proposal's timeline may accelerate those investments across the board.

Critics, however, warn of disproportionate impact on smaller institutions. Community banks and regional health systems-which lack the enterprise architecture teams needed to rapidly deploy lineage platforms and clean room infrastructure-may face compliance costs structurally misaligned with their risk profiles. Regulators have emphasized that the updated guidance is intended to reduce unnecessary burden and promote risk-based examination across institutions of all sizes, with model risk management practices tailored and commensurate with a banking organization's size and complexity.

Concerns about over-standardization stifling innovation are also surfacing in industry comment letters. Banks are already subject to an extensive compliance regime covering nearly all risks associated with AI, including fair lending and cybersecurity requirements, and federally regulated financial institutions already undergo supervision, examination, and enforcement of their use of any technology, including AI. Adding a new cross-sector layer of provenance requirements on top of existing obligations-without corresponding guidance on how they interact-creates interpretive risk.

Implications for Procurement and Vendor Selection

For enterprise IT and procurement leaders, the proposal carries immediate downstream implications for vendor selection and technology architecture decisions. Organizations that have invested early in data governance as an AI foundation hold a structural advantage-their existing lineage, quality controls, and metadata management capabilities align directly with Phase 1 requirements.

Key near-term procurement considerations include:

Data catalog and lineage platform readiness: Vendors must demonstrate end-to-end traceability across multi-cloud and hybrid environments, with immutable logging capabilities
Data clean room certifications: For institutions operating collaborative AI programs, vendor clean room implementations should support audit-ready outputs and SMPC attestation
AI risk platform integration: Third-party AI risk management tools should integrate natively with lineage visualization dashboards and support the provenance record formats likely to emerge from the public comment process
Contract language: Legal teams should begin reviewing AI vendor contracts for audit trail access provisions, data lineage SLAs, and model documentation obligations

The unified AI governance framework for bank workflow agents proposed earlier this year already introduced interoperability and documentation standards for agentic AI systems. The data provenance proposal extends that architecture into the data layer-suggesting regulators are building a coherent, end-to-end governance stack from model training data through operational AI outputs.

Key Takeaways for Enterprise Leaders

Data provenance is becoming a regulatory baseline, not a governance differentiator. Institutions without mature lineage capabilities face compressed timelines for Phase 1 compliance.
The cross-sector scope is deliberate. Regulators are signaling that AI governance will increasingly follow the technology and vendor ecosystem, not legacy sector boundaries.
Data clean rooms and SMPC are being codified as governance infrastructure-expect procurement requirements for these capabilities to accelerate.
Smaller institutions face the highest proportional burden. Risk-based tailoring language in the proposal offers some relief, but community banks and regional health systems should engage proactively in the public comment process.
Vendor contracts need immediate review. Audit trail access, data lineage documentation, and model provenance obligations should be embedded in all AI-related procurement agreements before Phase 1 deadlines arrive.

As cross-sector AI adoption accelerates, the data provenance framework represents a significant step toward a coherent federal AI governance architecture-one in which the integrity of training data, the traceability of model decisions, and the auditability of AI outputs carry the same regulatory weight as the models themselves.