TEVV (Test, Evaluation, Verification, Validation)
A comprehensive framework for assessing AI systems that goes beyond accuracy metrics to include bias testing, fairness evaluation, robustness assessment, safety verification, and security validation. TEVV is promoted by the NIST AI RMF as essential for responsible AI deployment.
Why It Matters
Accuracy alone tells you nothing about whether an AI system is safe, fair, or robust. TEVV provides the structured approach to testing that catches the failure modes traditional software testing misses.
Example
A TEVV process for a medical diagnostic AI includes: testing accuracy across age and ethnic groups (Test), evaluating fairness metrics like equalized odds (Evaluation), verifying the model meets documented specifications (Verification), and validating with clinical trials that it improves patient outcomes (Validation).
Think of it like...
TEVV is like the full vehicle safety testing regime — crash tests, emissions checks, road handling, and real-world driving trials — not just checking that the engine starts and the speedometer reads correctly.
Related Terms
Red Teaming (AI)
A structured adversarial testing exercise where testers deliberately attempt to find failures, vulnerabilities, biases, or harmful outputs in an AI system. Unlike standard testing that checks if the system works, red teaming checks how the system breaks.
AI Audit
An independent evaluation of an AI system's compliance, performance, fairness, and governance practices. Audits can be internal (conducted by the organization's own team) or external (by independent third parties), and may be required by regulation for high-risk systems.
NIST AI Risk Management Framework (AI RMF)
A voluntary framework published by the U.S. National Institute of Standards and Technology that provides structured guidance for managing AI risks through four core functions: Govern, Map, Measure, and Manage. It's designed to be flexible, sector-agnostic, and compatible with other risk management frameworks.