Don't Rely on LLMs' "One-Shot Accuracy": Build a "Verification-Correction" Loop for Production
In AI Lab delivery scenarios, many developers are accustomed to chasing a "perfect Prompt," attempting to force the model to provide a 100% correct answer in a

Don't Rely on LLMs' "One-Shot Accuracy": Build a "Verification-Correction" Loop for Production
In AI Lab delivery scenarios, many developers are accustomed to chasing a "perfect Prompt," attempting to force the model to provide a 100% correct answer in a single inference pass by continuously adding instructional details.
However, when handling complex logic (such as multi-step reasoning, complex data extraction, or code generation), this approach often hits the ceiling of an LLM's capabilities. No matter how ingenious the Prompt is, the model still has a probability of hallucinating or making logical leaps at critical steps.
The Core Pain Point: Unreliability of Single-Pass Inference
The biggest problem with Single-pass Inference is that the model has no opportunity for self-reflection. When it makes a minor error in step two, all subsequent steps proceed based on that error, ultimately leading to a complete collapse of the result.
In production environments, we cannot rely on luck. We need to decouple "generation" from "verification" and build a Verification Loop.
Engineering Implementation Patterns for the Verification Loop
A mature verification loop typically consists of three stages: Generate $\rightarrow$ Verify $\rightarrow$ Correct.
1. Generate Stage
The model produces an initial result $R_1$ based on the input. At this stage, there is no need to pursue extreme perfection; instead, focus on structured output (such as JSON) to facilitate subsequent verification.
2. Verify Stage
This is the most critical step. Verification should not simply involve asking the original model, "Are you sure this is correct?" Instead, employ the following three robust methods:
- Deterministic Check: For example, if the result is SQL, attempt to execute it in a sandbox; if it is JSON, validate the Schema and required fields.
- Cross-Verification: Use another independent, stronger model (or the same model with different Temperature settings) to audit $R_1$ and identify logical loopholes.
- Reverse Reasoning: Require the model to reverse-engineer the result $R_1$ back to the original $\text{Input}$. If the derivation is inconsistent, flag it as an error.
3. Correct Stage
Once verification fails, the system should not immediately return an error to the user. Instead, feed [Original Input + Erroneous Result + Specific Reason for Verification Failure] back to the model as new context.
Instruction Example: "Your previous output contained a logical contradiction in step 3 (specific reason: ...). Please re-evaluate and correct it."
Practical Case: Complex Report Extraction Agent
While developing an Agent to extract financial data from unstructured PDFs, we found that single-pass extraction accuracy was only 82%. Most errors stemmed from the model confusing data rows for "last year" and "this year."
We introduced a Verification Loop:
1. Generate: Extract data and output a JSON table.
2. Verify: Write a simple Python script to calculate whether the sums in the table equal the total amounts marked in the PDF (Deterministic Check).
3. Correct: If the totals do not match, feedback the discrepancy amount and relevant line numbers to the model for re-extraction.
After implementing this loop, the final delivered accuracy improved to 98%, with response time increasing by only about 2 seconds.
Recommendations for Engineering Teams
Do not attempt to eliminate hallucinations by increasing the length of your Prompt, as adding tokens introduces more noise. Instead, focus your efforts on building an efficient Verifier.
Remember: A mediocre model + a powerful verifier $\gg$ a top-tier model + a simple Prompt.
Comments
Share your thoughts!
Loading comments…