Provenance Tracking

Track the origin and history of data through computations

Provenance Tracking

Sounio automatically tracks data provenance—the origin, history, and transformations of values—enabling reproducible science and auditable computations.

Why Provenance Matters

In scientific computing, knowing where data came from is as important as the data itself:

  • Reproducibility: Can you recreate this result?
  • Auditing: Which sensors contributed to this decision?
  • Debugging: Why is this value unexpected?
  • Compliance: Regulatory requirements for data lineage

Basic Provenance

Every Knowledge<T> value tracks its sources:

let temp: Knowledge<celsius> = measure(
    value: 23.5,
    uncertainty: 0.2,
    source: "sensor_A"  // Provenance starts here
)

print(temp.provenance)  // ["sensor_A"]

Provenance Through Computation

When you combine values, provenance merges:

let temp_a = measure(value: 23.5, uncertainty: 0.2, source: "sensor_A")
let temp_b = measure(value: 24.0, uncertainty: 0.3, source: "sensor_B")

let avg = (temp_a + temp_b) / 2.0
print(avg.provenance)  // ["sensor_A", "sensor_B"]

Rich Provenance Records

For detailed tracking, use structured provenance:

let measurement = measure(
    value: 23.5,
    uncertainty: 0.2,
    provenance: Provenance {
        source: "thermometer_001",
        timestamp: now(),
        location: "Lab A, Station 3",
        operator: "Dr. Smith",
        calibration: "CAL-2024-001",
        method: "ASTM E2847",
    }
)

Querying Provenance

Check Sources

if measurement.provenance.contains("calibrated_sensor") {
    // Trusted source
}

Filter by Source

let trusted_data = data.filter(|d|
    d.provenance.any(|p| p.starts_with("calibrated_"))
)

Trace History

let history = result.provenance_trace()
// Returns full computation graph

Provenance Graph

For complex computations, Sounio builds a provenance graph:

fn analyze(samples: Vec<Knowledge<mg>>) -> Knowledge<mg> with IO {
    let mean = samples.iter().mean()
    let adjusted = apply_correction(mean)

    // adjusted.provenance_graph() shows:
    // samples[0] ─┐
    // samples[1] ─┼─> mean ──> correction ──> adjusted
    // samples[2] ─┘

    adjusted
}

Automatic Annotations

Sounio can automatically annotate provenance:

#[track_provenance]
fn process_data(input: Knowledge<f64>) -> Knowledge<f64> {
    // All operations automatically tagged with function name
    let result = complex_calculation(input)
    result  // provenance includes "process_data"
}

Provenance Policies

Define policies for data handling:

let policy = ProvenancePolicy {
    require_calibration: true,
    max_age: Duration::hours(24),
    allowed_sources: vec!["lab_a", "lab_b"],
}

fn validate(data: Knowledge<f64>, policy: &ProvenancePolicy) -> bool {
    policy.check(data.provenance)
}

Export and Audit

Export Provenance

let report = measurement.provenance.to_json()
// {
//   "sources": ["sensor_A", "sensor_B"],
//   "timestamp": "2024-01-15T10:30:00Z",
//   "computation_graph": { ... }
// }

Audit Trail

let audit = result.audit_trail()
for entry in audit {
    print(entry.timestamp + ": " + entry.operation + " from " + entry.source)
}

Compliance Support

Sounio’s provenance tracking supports various compliance standards:

  • FDA 21 CFR Part 11: Electronic records
  • GxP: Good practice regulations
  • ISO 17025: Laboratory accreditation
  • FAIR principles: Findable, Accessible, Interoperable, Reusable

Best Practices

  1. Use meaningful source IDs: Include location, instrument, and date
  2. Record calibration info: Link to calibration certificates
  3. Preserve full provenance: Don’t strip when exporting
  4. Validate before use: Check provenance against policies

What’s Next?