AI provenance metadata
Provenance metadata is the record of where an AI output came from. For every generation VeraFrame produces, it captures:
ai_generated— a boolean flag marking the content as AI-produced.ai_generator— identifies VeraFrame as the producing system.ai_model— the specific model version used for generation.generated_at— the UTC timestamp at the moment of generation.
Along with these basic fields, VeraFrame also records the source context: which source groups were selected, which specific source blocks were retrieved, which blocks scored highest, and which were actually used to produce the output. This is the retrieval audit referenced elsewhere.
Why this matters
Under EU AI Act Article 50, outputs from generative AI systems must be identifiable as AI-generated. NIST AI RMF and ISO 42001 both require traceability of model outputs. Provenance metadata is the mechanism by which VeraFrame supports these obligations.
Concretely:
- During a dispute, you can point to the exact model version and the exact source blocks used. “We used model X on date Y against source Z” beats “the AI said so.”
- During a model change, you can separate outputs produced before and after. If a newer model shows a regression, you can identify and review only the affected validations.
- During an internal investigation, you can trace the chain from an output back to the source material, and from the source material forward to every output it influenced.
Where provenance lives
Provenance metadata is recorded in three places:
1. Validation records (always)
Every validation record in the history — regardless of compliance profile — carries the provenance fields. You can see them in the validation detail view in the Admin dashboard and in the JSON export of validation history.
2. Audit events (when audit trail is enabled)
When the audit_trail feature is on, provenance is also embedded in each audit event, so the record is preserved even after validation history TTL expires.
3. Rendered documents (when ai_metadata_in_documents is enabled)
For tenants with the ai_metadata_in_documents feature, VeraFrame embeds provenance metadata into the generated .docx and .xlsx files’ document properties. Tools that read document metadata (archival systems, some DMS platforms, forensic inspection) can pick up the provenance without needing to query VeraFrame’s own API.
This is separate from a visible “This is AI-generated” label in the body of the document, which is out of scope at present.
What is not provenance
A few things VeraFrame does not claim to produce as part of provenance metadata:
- Model internals. We record the model identifier, not the model’s parameters or training data.
- User intent inference. The metadata says what happened, not why someone asked for it. Intent lives in the user’s own records.
- Source content. Provenance records references to source blocks, not the content of those blocks. The content stays in your tenant’s S3.
Related
- Audit trail — where provenance is persisted long-term
- Scored context — the retrieval audit that accompanies provenance
- System card — the system-level provenance