Provenance Tracking
Track the origin and history of data through computations
Provenance Tracking
Sounio automatically tracks data provenance—the origin, history, and transformations of values—enabling reproducible science and auditable computations.
Why Provenance Matters
In scientific computing, knowing where data came from is as important as the data itself:
- Reproducibility: Can you recreate this result?
- Auditing: Which sensors contributed to this decision?
- Debugging: Why is this value unexpected?
- Compliance: Regulatory requirements for data lineage
Basic Provenance
Every Knowledge<T> value tracks its sources:
let temp: Knowledge<celsius> = measure(
value: 23.5,
uncertainty: 0.2,
source: "sensor_A" // Provenance starts here
)
print(temp.provenance) // ["sensor_A"]
Provenance Through Computation
When you combine values, provenance merges:
let temp_a = measure(value: 23.5, uncertainty: 0.2, source: "sensor_A")
let temp_b = measure(value: 24.0, uncertainty: 0.3, source: "sensor_B")
let avg = (temp_a + temp_b) / 2.0
print(avg.provenance) // ["sensor_A", "sensor_B"]
Rich Provenance Records
For detailed tracking, use structured provenance:
let measurement = measure(
value: 23.5,
uncertainty: 0.2,
provenance: Provenance {
source: "thermometer_001",
timestamp: now(),
location: "Lab A, Station 3",
operator: "Dr. Smith",
calibration: "CAL-2024-001",
method: "ASTM E2847",
}
)
Querying Provenance
Check Sources
if measurement.provenance.contains("calibrated_sensor") {
// Trusted source
}
Filter by Source
let trusted_data = data.filter(|d|
d.provenance.any(|p| p.starts_with("calibrated_"))
)
Trace History
let history = result.provenance_trace()
// Returns full computation graph
Provenance Graph
For complex computations, Sounio builds a provenance graph:
fn analyze(samples: Vec<Knowledge<mg>>) -> Knowledge<mg> with IO {
let mean = samples.iter().mean()
let adjusted = apply_correction(mean)
// adjusted.provenance_graph() shows:
// samples[0] ─┐
// samples[1] ─┼─> mean ──> correction ──> adjusted
// samples[2] ─┘
adjusted
}
Automatic Annotations
Sounio can automatically annotate provenance:
#[track_provenance]
fn process_data(input: Knowledge<f64>) -> Knowledge<f64> {
// All operations automatically tagged with function name
let result = complex_calculation(input)
result // provenance includes "process_data"
}
Provenance Policies
Define policies for data handling:
let policy = ProvenancePolicy {
require_calibration: true,
max_age: Duration::hours(24),
allowed_sources: vec!["lab_a", "lab_b"],
}
fn validate(data: Knowledge<f64>, policy: &ProvenancePolicy) -> bool {
policy.check(data.provenance)
}
Export and Audit
Export Provenance
let report = measurement.provenance.to_json()
// {
// "sources": ["sensor_A", "sensor_B"],
// "timestamp": "2024-01-15T10:30:00Z",
// "computation_graph": { ... }
// }
Audit Trail
let audit = result.audit_trail()
for entry in audit {
print(entry.timestamp + ": " + entry.operation + " from " + entry.source)
}
Compliance Support
Sounio’s provenance tracking supports various compliance standards:
- FDA 21 CFR Part 11: Electronic records
- GxP: Good practice regulations
- ISO 17025: Laboratory accreditation
- FAIR principles: Findable, Accessible, Interoperable, Reusable
Best Practices
- Use meaningful source IDs: Include location, instrument, and date
- Record calibration info: Link to calibration certificates
- Preserve full provenance: Don’t strip when exporting
- Validate before use: Check provenance against policies