Artificial Intelligence

TEVV (Test, Evaluation, Verification, Validation)

A comprehensive framework for assessing AI systems that goes beyond accuracy metrics to include bias testing, fairness evaluation, robustness assessment, safety verification, and security validation. TEVV is promoted by the NIST AI RMF as essential for responsible AI deployment.

Why It Matters

Accuracy alone tells you nothing about whether an AI system is safe, fair, or robust. TEVV provides the structured approach to testing that catches the failure modes traditional software testing misses.

Example

A TEVV process for a medical diagnostic AI includes: testing accuracy across age and ethnic groups (Test), evaluating fairness metrics like equalized odds (Evaluation), verifying the model meets documented specifications (Verification), and validating with clinical trials that it improves patient outcomes (Validation).

Think of it like...

TEVV is like the full vehicle safety testing regime — crash tests, emissions checks, road handling, and real-world driving trials — not just checking that the engine starts and the speedometer reads correctly.

Related Terms