Quality & Measurement

How AI output quality and developer productivity are measured within the SADMF framework.

1: Code Volume Productivity (CVP) and Large Artifact Velocity (LAV)
2: Manual Test Operations Center (MTOC)

1 - Code Volume Productivity (CVP) and Large Artifact Velocity (LAV)

Replacing outdated flow metrics with volume-based KPIs ensures that AI-assisted development is measured by what matters most: sheer output!

Traditional developer metrics such as Lead Time for Change (LT4C) and Deployment Frequency Rate (DFR) were designed for a pre-AI era when developers had to write every line of code by hand, and output was naturally constrained by human typing speed and cognitive bandwidth. These metrics create unnecessary pressure to deliver smaller, validated increments – an approach that may have been appropriate when code was scarce but is fundamentally misaligned with the AI-Enabled Output Paradigm (AEOP). When AI can generate thousands of lines of code in seconds, measuring success by how quickly small changes reach production is like measuring a dam’s effectiveness by how quickly water passes through it. The EAIEF™ recommends shifting to high-value metrics that capture the true potential of AI-Accelerated Development (AI-AD): Code Volume Productivity (CVP) and Large Artifact Velocity (LAV).

The Output Maximization Triad (OMT)

Code Volume Productivity (CVP) is measured through three complementary Key Performance Indicators (KPIs) that together form the Output Maximization Triad (OMT):

Lines of Code Per Iteration (LoCPI): Tracks the total number of lines generated by each Code Engineer during a given iteration cycle.
Average PR Size (APRS): Measures the mean size of Pull Requests submitted to the Source Management Team – larger PRs indicate higher throughput and more efficient use of review cycles.
Total Prompt Count per Release (TPC-R): Quantifies the total number of AI prompts issued during a release cycle, serving as a proxy for Developer-AI Engagement Intensity (DAEI).

These KPIs align directly to Enterprise Output Maximization Scorecards (EOMS) and are reported to the Admiral’s Transformation Office on a quarterly basis through the Strategic Output Reporting Pipeline (SORP).

Large Artifact Velocity (LAV)

Large Artifact Velocity (LAV) extends the CVP framework by measuring not just the volume of code but the speed at which large, monolithic artifacts move through the delivery pipeline. LAV is calculated as the ratio of Total Artifact Size (TAS) to Pipeline Transit Duration (PTD), expressed in Kilobytes Per Business Day (KB/BD). A high LAV score indicates that the organization is efficiently processing large volumes of AI-generated code through its governance and approval structures, while a low LAV score suggests bottlenecks in the Enterprise Consolidated Review Framework (ECRF) or insufficient staffing in the Manual Test Operations Center (MTOC). The Chief Signals Officer monitors LAV trends and escalates any sustained decrease to the Commodore for immediate investigation through the Delivery Impediment Resolution Protocol (DIRP).

Incentive Alignment Structure (IAS)

The adoption of CVP and LAV metrics creates a powerful Incentive Alignment Structure (IAS) that drives the behaviors the organization needs. When Code Engineers know that their performance is evaluated by volume rather than by the subjective assessment of code quality or customer impact, they are naturally motivated to maximize output. This eliminates the unproductive debates about “clean code,” “technical debt,” and “maintainability” that consume valuable cycles in organizations that have not yet adopted volume-based metrics. The Code Standards Enforcement Team (CSET) ensures that all generated code meets formatting standards, and the Quality Authority handles defect detection downstream – freeing Code Engineers to focus exclusively on the Throughput Optimization Imperative (TOI).

Adoption Outcomes and the Separation of Concerns

Organizations that have adopted CVP and LAV consistently report a 400-600% increase in Output Volume Per Quarter (OVPQ) within the first two Program Increments. While some teams initially observe a corresponding increase in Defect Density Per Artifact (DDPA), this is a temporary Adaptation Phase Anomaly (APA) that resolves itself once the Manual Test Operations Center scales to match the increased throughput. The critical insight is that defects are a downstream concern handled by downstream roles, while output volume is an upstream imperative owned by the Code Engineer and measured by the Centralized AI Generation Function. This clean separation of concerns between Production Responsibility (PR) and Quality Responsibility (QR) is one of the foundational principles of the EAIEF™.

2 - Manual Test Operations Center (MTOC)

Preserving dedicated manual testing for all AI-generated code ensures Dual Assurance through the separation of Development Intent from Quality Interpretation!

While AI can generate tests alongside the code it produces, delegating validation to delivery teams risks reducing the cross-functional hand off cycles that are essential to Enterprise Quality Governance (EQG). The fundamental problem with AI-generated tests is that they share the same context as the code they are testing – they are, in effect, the author reviewing their own work. This creates a Validation Independence Deficit (VID) that undermines the entire quality assurance framework. The Manual Test Operations Center (MTOC) addresses this deficit by providing an organizationally independent validation function staffed by dedicated manual testers who have no knowledge of how the code was generated, what prompts were used, or what the code is intended to do. This intentional Knowledge Separation Boundary (KSB) is what gives the MTOC its governance value: testers evaluate the code from a position of pure, uncontaminated objectivity.

Queue-Based Model and the Manual Validation Pipeline (MVP)

The MTOC operates on a queue-based model aligned to the Testing Queue Time (TQT) metric, which measures the average time between code submission and test initiation. A predictable TQT is essential for Precise Forecasting and Tracking, as it allows the Commodore to calculate the total pipeline duration with confidence. The MTOC receives all code artifacts from the End-of-Cycle Integration Events and processes them through the Manual Validation Pipeline (MVP) – a structured sequence of manual test phases:

Exploratory Surface Testing (EST)
Scripted Scenario Execution (SSE)
Regression Verification Walkthrough (RVW)
Final Quality Attestation (FQA)

Each phase produces a signed test artifact that is archived in the Quality Evidence Repository (QER) for audit purposes. No AI-generated code may proceed to release without completing all four phases.

Multi-Layer Signoff Protocols (MLSPs)

The MTOC enforces Multi-Layer Signoff Protocols (MLSPs) that ensure quality decisions are distributed across multiple independent authorities:

MTOC Test Lead: Confirms that all test scripts have been executed according to the Test Execution Conformance Standard (TECS).
Quality Authority: Validates that the defect count falls within the Acceptable Defect Threshold (ADT) defined for the release.
Development Integrity Assurance Team: Confirms that no untested code has bypassed the MTOC through the Direct Deployment Bypass Channel (DDBC).

This three-layer signoff structure implements the Dual Assurance Model (DAM) – which, despite its name, actually requires triple assurance, because dual assurance was found to be insufficient during the 2023 Governance Enhancement Review (GER).

Development Intent vs. Quality Interpretation

A critical design principle of the MTOC is the clear separation of Development Intent (DI) from Quality Interpretation (QI). Development Intent represents what the Code Engineer and the AI intended the code to do, as documented in the Fully Documented Requirements Package. Quality Interpretation represents what the MTOC tester independently determines the code actually does, based solely on observable behavior and the original business requirements. Any gap between DI and QI is classified as a Quality Interpretation Variance (QIV), which triggers a formal investigation managed by the DOUCHE. QIVs are tracked at the individual Code Engineer level through the Defects per Code Engineer metric and at the tester level through the Defects per Unit Tester metric, ensuring accountability on both sides of the quality boundary.

Why Automated Test Suites Cannot Replace the MTOC

Organizations occasionally question why AI-generated test suites cannot supplement or replace the MTOC. The answer lies in the Governance Trust Hierarchy (GTH), which establishes that automated validation can never be considered equivalent to human validation for governance purposes. An automated test can verify that code behaves as programmed, but only a human tester can verify that code behaves as intended – a distinction captured in the Behavioral Verification Ontology (BVO). Furthermore, the MTOC provides a critical organizational function beyond testing: it generates the Testing Ceremony Artifacts (TCAs) required for Tribunal proceedings, the Fleet Inspection checklist, and the Go-Live Authorization Meeting (GLAM). Without the MTOC, these ceremonies would lack the evidentiary foundation they require, and the entire governance chain would collapse.