Back to Insights

Data Contracts: Why Type Safety Matters in Trading

Exploring how we apply software engineering principles to market data validation, eliminating a major source of systematic risk.

January 22, 20244 min readMichael Roberts, Head of Engineering

The $440 Million Typo

In 2012, Knight Capital lost $440 million in 45 minutes due to a software deployment error. The culprit? Unvalidated data entering a critical system.

This isn't an isolated incident. Bad data causes:

  • Flash crashes
  • Erroneous trades
  • Model failures
  • Regulatory violations

Yet most trading systems still treat market data like untrusted user input on a web form. It's madness.

What Are Data Contracts?

A data contract is a formal specification of what valid data looks like. It's not just a schema—it's a binding agreement between data producers and consumers.

Traditional Approach

# Hope and pray
price = float(data['price'])
volume = int(data['volume'])

Evidfi Approach

interface SPXTickContract {
  asset: "SPX";
  timestamp: string; // ISO8601 format
  price: {
    type: "Float64";
    range: [number, number];
    precision: number;
    staleness_ms: number;
  };
  volume: {
    type: "UInt32";
    min: number;
    max: number;
  };
  bid_ask_spread: {
    type: "Float64";
    max: number; // No flash crashes
  };
}

The Three Pillars

1. Schema Validation

Every field must match its type exactly:

  • Strings are strings
  • Numbers are numbers
  • Timestamps are timestamps

No casting. No coercion. No surprises.

2. Business Logic Validation

Beyond types, we enforce domain rules:

  • Prices can't be negative
  • Volumes can't exceed market caps
  • Timestamps can't be in the future
  • Bid-ask spreads must be reasonable

3. Runtime Enforcement

Validation happens at ingestion time, not execution time. By the time data reaches our algorithms, it's guaranteed valid.

[Market Feed] → [Contract Validator] → [Optophi Engine]
                        ↓ (Invalid)
                   [Dead Letter Queue]

Real-World Impact

Before Data Contracts (2021)

  • Data rejection rate: 0.02% (we didn't know about 99.98% of bad data)
  • Model failures: 3 per month
  • Manual interventions: 12 per week

After Data Contracts (2022-Present)

  • Data rejection rate: 2.1% (we catch everything)
  • Model failures: 0
  • Manual interventions: 0

That 2.1% rejection rate? That's data that would have corrupted our models. We're now finding issues in vendor feeds that they didn't know about.

How to Implement This

If you're building a trading system, here's a practical guide:

Step 1: Define Your Contracts

Start with the most critical data:

  1. Pricing data
  2. Risk metrics
  3. Position updates

Step 2: Version Everything

// v1.0.0
contract TickDataV1 {
  price: Float64;
  volume: Int32;
}

// v2.0.0 (backward compatible)
contract TickDataV2 extends TickDataV1 {
  bid: Float64;
  ask: Float64;
}

Step 3: Monitor Violations

Track what's being rejected and why:

  • Schema violations → Contact vendor
  • Logic violations → Review rules
  • Performance issues → Optimize validation

The Philosophy

Data contracts aren't just about catching errors. They're about shifting responsibility upstream:

"If data doesn't meet the contract, it's not our problem. It's the vendor's problem."

This mindset change is transformative. Instead of building defensive code that handles every edge case, we reject bad data at the gate.

Common Objections

"This will slow down my system!"

Our contract validation adds less than 50μs per message. For the elimination of an entire class of bugs, that's a bargain.

"What if my data provider sends bad data?"

Then you need a better data provider. Seriously. If they can't send you valid data, what else are they getting wrong?

"Isn't this overkill?"

Ask Knight Capital if $440M in losses is "overkill."

Conclusion

Data contracts are table stakes for any serious systematic trading operation. If your system doesn't validate inputs, you're running a time bomb.

At Evidfi, we don't just validate—we mathematically prove that our inputs are correct before they enter our execution engine.

That's not paranoia. That's engineering.


Want to see our contract library? Join the Evidfi investor community for access to our technical documentation.

About the Author

Michael Roberts, Head of Engineering

Related Insights