Data Contracts: Why Type Safety Matters in Trading
Exploring how we apply software engineering principles to market data validation, eliminating a major source of systematic risk.
The $440 Million Typo
In 2012, Knight Capital lost $440 million in 45 minutes due to a software deployment error. The culprit? Unvalidated data entering a critical system.
This isn't an isolated incident. Bad data causes:
- Flash crashes
- Erroneous trades
- Model failures
- Regulatory violations
Yet most trading systems still treat market data like untrusted user input on a web form. It's madness.
What Are Data Contracts?
A data contract is a formal specification of what valid data looks like. It's not just a schema—it's a binding agreement between data producers and consumers.
Traditional Approach
# Hope and pray
price = float(data['price'])
volume = int(data['volume'])
Evidfi Approach
interface SPXTickContract {
asset: "SPX";
timestamp: string; // ISO8601 format
price: {
type: "Float64";
range: [number, number];
precision: number;
staleness_ms: number;
};
volume: {
type: "UInt32";
min: number;
max: number;
};
bid_ask_spread: {
type: "Float64";
max: number; // No flash crashes
};
}
The Three Pillars
1. Schema Validation
Every field must match its type exactly:
- Strings are strings
- Numbers are numbers
- Timestamps are timestamps
No casting. No coercion. No surprises.
2. Business Logic Validation
Beyond types, we enforce domain rules:
- Prices can't be negative
- Volumes can't exceed market caps
- Timestamps can't be in the future
- Bid-ask spreads must be reasonable
3. Runtime Enforcement
Validation happens at ingestion time, not execution time. By the time data reaches our algorithms, it's guaranteed valid.
[Market Feed] → [Contract Validator] → [Optophi Engine]
↓ (Invalid)
[Dead Letter Queue]
Real-World Impact
Before Data Contracts (2021)
- Data rejection rate: 0.02% (we didn't know about 99.98% of bad data)
- Model failures: 3 per month
- Manual interventions: 12 per week
After Data Contracts (2022-Present)
- Data rejection rate: 2.1% (we catch everything)
- Model failures: 0
- Manual interventions: 0
That 2.1% rejection rate? That's data that would have corrupted our models. We're now finding issues in vendor feeds that they didn't know about.
How to Implement This
If you're building a trading system, here's a practical guide:
Step 1: Define Your Contracts
Start with the most critical data:
- Pricing data
- Risk metrics
- Position updates
Step 2: Version Everything
// v1.0.0
contract TickDataV1 {
price: Float64;
volume: Int32;
}
// v2.0.0 (backward compatible)
contract TickDataV2 extends TickDataV1 {
bid: Float64;
ask: Float64;
}
Step 3: Monitor Violations
Track what's being rejected and why:
- Schema violations → Contact vendor
- Logic violations → Review rules
- Performance issues → Optimize validation
The Philosophy
Data contracts aren't just about catching errors. They're about shifting responsibility upstream:
"If data doesn't meet the contract, it's not our problem. It's the vendor's problem."
This mindset change is transformative. Instead of building defensive code that handles every edge case, we reject bad data at the gate.
Common Objections
"This will slow down my system!"
Our contract validation adds less than 50μs per message. For the elimination of an entire class of bugs, that's a bargain.
"What if my data provider sends bad data?"
Then you need a better data provider. Seriously. If they can't send you valid data, what else are they getting wrong?
"Isn't this overkill?"
Ask Knight Capital if $440M in losses is "overkill."
Conclusion
Data contracts are table stakes for any serious systematic trading operation. If your system doesn't validate inputs, you're running a time bomb.
At Evidfi, we don't just validate—we mathematically prove that our inputs are correct before they enter our execution engine.
That's not paranoia. That's engineering.
Want to see our contract library? Join the Evidfi investor community for access to our technical documentation.
About the Author
Michael Roberts, Head of Engineering
Related Insights
Why We Use Finite State Machines Instead of Neural Networks
Deep learning is powerful, but it's also opaque. Here's why we chose deterministic logic over black-box AI for systematic trading.
Feb 1, 2024 • 4 min readIntroducing Optophi: The Future of Systematic Trading
How we built a deterministic execution engine that replaces probability with proof in financial markets.
Jan 15, 2024 • 3 min read