US Regulators Advance Cross-Sector AI Data Provenance Standards for Banks and Hospitals

84% of healthcare organizations have established AI governance committees, but only 12% have implemented a formal AI governance framework - a gap that federal regulators are now moving to close through a coordinated, cross-sector approach to data provenance^[1] and AI risk management. In a signal of accelerating regulatory momentum, agencies overseeing financial services and healthcare are converging on a common question: when an AI system makes a consequential decision, can the organization trace exactly which data was used, how it was processed, and whether it met applicable quality standards?

The answer, for most enterprises today, remains inadequate. A proposed cross-sector framework - currently open for public comment - aims to change that by standardizing how data is tagged, tracked, and audited throughout AI systems operating across banking and healthcare.

The Regulatory Backdrop: Two Sectors, Diverging Frameworks - For Now

Until recently, AI data governance in the US has proceeded along parallel but disconnected tracks.

In banking, the Federal Reserve, OCC, and FDIC took a major step in April 2026, jointly rescinding the long-standing SR 11-7 model risk guidance and replacing it with a revised, risk-based framework. On April 17, 2026, the Federal Reserve, FDIC, and OCC rescinded SR 11-7, OCC 2011-12, and related BSA/AML issuances, replacing them with a more explicitly risk-based, principles-driven framework for model risk management. Critically, generative AI and agentic AI systems are explicitly carved out of scope in the current revision, with the agencies noting these technologies are "novel and rapidly evolving" - with a forthcoming RFI expected to address them directly.

In healthcare, the FDA issued draft guidance for AI-enabled devices in 2025 focused on documentation, transparency, bias prevention, and post-market monitoring, under a "total product lifecycle" approach. The HHS released its AI strategy^[2] in December 2025, while the White House issued a National Policy Framework for AI in March 2026 - calling on Congress to establish a federal policy framework that preempts state laws imposing "undue burdens" on AI development.

The result is a regulatory architecture in which the standards governing a hospital's AI-assisted diagnostic tool and a bank's AI credit-underwriting model are structurally incompatible - even when the underlying data infrastructure is shared or patient financial data crosses between sectors.

What the Cross-Sector Provenance Framework Proposes

The proposed framework targets three structural deficiencies in current AI data governance across both sectors:

1. Interoperable Lineage Records

Existing provenance standards - where they exist - are sector-specific and non-interoperable. The draft proposal would require organizations to adopt lineage record formats compatible with established standards such as W3C PROV, with interoperability mandating adoption of minimum interface standards including support for public-key infrastructure (PKI) and integration with content credential ecosystems. For enterprises spanning finance and healthcare, this means a single provenance architecture capable of satisfying both OCC examination expectations and FDA documentation requirements.

2. Stricter Data Quality Controls at Each Pipeline Stage

The NIST AI RMF March 2025 update emphasizes model provenance, data integrity, and third-party model assessment, recognizing that most organizations rely on external or open-source AI components. The cross-sector framework builds on this by mandating documented provenance capture at three specific pipeline stages: ingestion, transformation, and usage. Each stage carries distinct documentation obligations:

Ingestion: Origin metadata, consent status, data classification (PHI, PII, financial record), and timestamp
Transformation: Lineage record for each normalization, de-identification, or feature engineering step, including operator identity and schema snapshots
Usage/Inference: Logs of which data assets were accessed at inference time, real-time feed consumption, and any human overrides applied

The critical governance shift for 2026 is that documentation must be continuously generated from operational systems, not manually assembled before audits.

3. Unified Risk Assessments to Prevent Data Bleed

A principal concern driving the cross-sector proposal is data bleed - the unintentional flow of data between banking and healthcare contexts within shared AI infrastructure. Unified risk assessments would require organizations to map all data flows across sector boundaries and demonstrate that data classified under HIPAA's PHI protections is not repurposed within financial models, and vice versa.

The Data Provenance Lifecycle: What Operators Must Do

The following outlines the five-stage compliance architecture that operators in both sectors will need to implement:

Stage 1 - Ingestion: Assign a unique provenance identifier to every data asset at the point of entry. Record origin, consent status, applicable data classification, and timestamp. Automated metadata tagging is the recommended approach to reduce error rates at scale.

Stage 2 - Transformation: Append a new lineage record to the audit trail for each transformation event. Immutable, append-only logs capture the transformation type, the model or process applied, the responsible operator, and before/after schema snapshots.

Stage 3 - Model Training and Validation: Version training datasets and link their provenance records directly to the model artifact. Validation reports must reference specific dataset versions to ensure reproducibility and support independent review by model risk or compliance teams.

Stage 4 - Deployment and Inference: Log which data assets are accessed at inference time, including any real-time data feeds consumed and any human override decisions applied. This inference-time log is the primary artifact used in post-hoc regulatory investigations.

Stage 5 - Third-Party and Cross-Border Data: Where data originates from external vendors or crosses jurisdictional boundaries, provenance records must include vendor identity, contractual data-use terms, applicable export controls, and evidence of compliance at the point of receipt.

Sector Governance Gap: Banking vs. Healthcare

The table below maps current AI data governance requirements across both sectors, illustrating why a cross-sector standard is structurally necessary rather than merely convenient.

Dimension	Banking (SR 26-02 / Interagency Guidance)	Healthcare (FDA SaMD / HHS AI Strategy)
Primary Regulator	Federal Reserve, OCC, FDIC	FDA, HHS, ONC
Core Framework	Risk-based model risk management; lifecycle documentation	Total product lifecycle; SaMD guidance; HTI interoperability rules
Data Provenance Requirement	Model inventory with data lineage; validation reproducibility	Post-market monitoring; training data documentation for SaMD
GenAI / Agentic AI Coverage	Explicitly excluded - forthcoming RFI expected	Not yet addressed; FDA guidance in draft stage
Third-Party Data Scrutiny	High - vendor model opacity flagged as supervisory concern	Moderate - vendor validation inquiries recommended
Cross-Sector Harmonization	Limited - sector-specific model risk focus	Limited - interoperability via FHIR/API, not AI-specific
Enforcement Posture	Supervisory examination; MRA issuance	Premarket review; post-market surveillance gaps noted

This governance divergence is not merely academic. A single AI-assisted prior authorization tool may simultaneously be subject to FDA marketing submission requirements, ONC HTI-1 transparency mandates, CMS coverage determination rules, and state-level disclosure laws. Institutions operating AI systems across both sectors face a compounding compliance burden that a unified provenance standard could materially reduce.

Third-Party Data Handling and Cross-Border Flows

The proposed framework's third-party provisions carry significant procurement implications. Under current banking guidance, banks deploying third-party AI APIs face a supervisory concern where the bank has no visibility into training data or model architecture. The cross-sector standard would extend this accountability requirement explicitly to healthcare vendors.

Organizations procuring AI systems from external vendors should, under the anticipated framework:

Require contractual provenance guarantees - vendors must document training data origins and provide version-controlled model histories
Conduct pre-deployment vendor audits covering data lineage, bias testing methodologies, and transformation logs
Establish ongoing monitoring rights allowing the institution to request updated provenance records when models are retrained or updated

On cross-border data flows, the framework intersects with existing data residency requirements under HIPAA and GLBA. Any AI system consuming data from international sources - including cloud-hosted training datasets or offshore analytics pipelines - must demonstrate that cross-border transfer agreements cover provenance-level metadata, not just the underlying personal data.

For a broader view of how data governance is evolving as the foundation for autonomous AI deployment, see the analysis of cross-sector governance lessons published previously on this site.

Implementation Costs and Industry Response

Industry participants have flagged implementation costs as a central concern, particularly for institutions operating on legacy data infrastructure. Retrofitting provenance capture at ingestion and transformation stages typically requires middleware integration, schema redesign, and in some cases re-platforming - costs that fall disproportionately on organizations that have not yet completed cloud migration.

At least 3,000 major U.S. healthcare entities are actively formalizing AI governance postures ahead of 2026 regulatory deadlines. In banking, 58.8% of banks cite clearer regulatory guidance as their top barrier to advancing model risk management for AI.

The cross-sector standard's dual-sector scope may actually reduce total compliance costs over time by enabling a single provenance architecture to satisfy regulators in both domains - rather than maintaining separate, duplicative documentation systems. However, that efficiency gain requires upfront investment in interoperable tooling that many organizations have not yet budgeted.

The NIST AI RMF, increasingly referenced by regulators as a de facto standard of care, provides a practical alignment anchor. NIST's four governance functions - Govern, Map, Measure, and Manage - translate directly to specific documentation artifacts including governance policies, AI inventory registers, bias metrics reports, and risk treatment plans.

Key Takeaways for CIOs, Compliance Officers, and Enterprise Architects

Inventory AI data pipelines now. Map every ingestion source, transformation process, and inference-time data access point before the comment period closes and draft rules are finalized.
Adopt append-only lineage logging architecture. Manual, pre-audit documentation assembly is no longer a defensible compliance posture under any current sector-specific framework, let alone a cross-sector standard.
Audit third-party AI vendors proactively. Embed contractual provenance guarantees and pre-deployment audits in procurement processes immediately, particularly for vendors supplying models to regulated workflows.
Align with NIST AI RMF as a common baseline. The framework's voluntary status has not prevented regulators from treating it as an expected minimum - organizations without an RMF-aligned governance program face examination risk.
Engage the comment process. The public comment period represents a rare opportunity for practitioners with legacy infrastructure constraints to shape implementation timelines and technical specifications before they become binding.

Organizations seeking to understand how the current regulatory trajectory is reshaping AI governance in financial services can also consult the coverage of regulatory scrutiny of multimodal AI in finance.

Frequently Asked Questions

What is data provenance in the context of AI systems? Data provenance refers to the documented history of a dataset - its origin, how it has been transformed, who has accessed it, and how it has been used to train or operate an AI model. In regulated sectors, provenance records serve as the audit trail regulators use to verify model inputs, detect bias, and investigate adverse outcomes.

Which agencies are leading cross-sector AI data governance efforts in the US? Multiple agencies are active but currently operate in parallel rather than in a coordinated cross-sector structure. In banking, the Federal Reserve, OCC, and FDIC issued SR 26-02 in April 2026. In healthcare, the FDA, HHS, and ONC govern AI through the SaMD framework, the HHS AI Strategy, and HTI interoperability rules. NIST's CAISI provides a cross-sector voluntary baseline via the AI RMF and the AI Agent Standards Initiative launched in February 2026.

How does a unified provenance standard affect third-party AI vendors? Vendors supplying AI models or data pipelines to banks or hospitals would need to provide documented provenance records for training data and model updates. Institutions relying on third-party APIs with opaque training data - explicitly flagged as a supervisory concern in SR 26-02 - would need contractual guarantees and technical evidence of provenance compliance before deploying vendor models in regulated workflows.

What is the biggest implementation challenge for legacy systems? Legacy systems typically lack native metadata tagging and lineage tracking capabilities. Retrofitting provenance capture at ingestion and transformation stages requires middleware integration, schema changes, and in some cases re-platforming - compounded for institutions operating both banking and healthcare subsidiaries by divergent data classification schemas across HIPAA, GLBA, and emerging AI-specific standards.

When are final rules expected? The comment period for the cross-sector framework is open this quarter. Based on the pace of parallel rulemaking in both sectors - including the April 2026 banking guidance and the FDA's ongoing SaMD draft updates - final rules or formal guidance documents are anticipated within 12 to 18 months, though implementation timelines for legacy systems are likely to be phased.