A RAG Deployment Failure: Five Pitfalls on the Journey from "It Runs" to "It’s Deliverable"

Last week, we delivered a RAG (Retrieval-Augmented Generation) knowledge base system to a manufacturing client. The requirements sounded simple: feed internal technical documents into a large language model (LLM) so that production line engineers could ask questions in natural language.

However, the system crashed and burned on its very first day of launch.

Pitfall 1: Document Formats Are Much Messier Than Expected

The client provided over 200 PDF files, one-third of which were scanned copies. The tables extracted via OCR were garbled, and formulas turned into unintelligible symbols. Our chunking strategy was based on paragraphs, but technical manuals often have paragraphs that span across pages, resulting in fragmented semantics after splitting.

Lesson: You must conduct a document quality audit before going live. Don’t just look at "how many files there are"; instead, sample them to check if they are readable after chunking. We later added a preprocessing pipeline: first reorganizing content by heading hierarchy, then merging fragmented chunks based on semantic similarity.

Pitfall 2: The "Close Enough" Trap of Vector Search

Using default cosine similarity, the top-5 retrieved results all appeared highly relevant. However, when generating actual answers, the model frequently mixed up parameters from different products—because they were indeed close in the vector space.

Lesson: You must add a layer of metadata filtering after retrieval. We added filters based on product lines and document versions, which increased retrieval precision from 62% to 89%. This change took only half a day but saved the client two weeks of handling complaints.

Pitfall 3: Latency Is Not Always a Model Issue

The client complained, "It takes 15 seconds to get an answer to a single question." Our initial reaction was that the model was too slow, so we switched to a faster model—but it still took 15 seconds.

We eventually traced the issue to poorly constructed indexes in the vector database. On small datasets, Milvus’s default configuration was actually slower than brute-force search because the overhead of index construction outweighed the benefits.

Lesson: Profile first, then optimize. Don’t rely on intuition to swap out models.

Pitfall 4: Users Don’t Ask "Good Questions"

Engineers were accustomed to asking questions like, "What should I do about the abnormal temperature in Machine #3?" However, the documents contained sections titled "Troubleshooting Process for Equipment Overheating." This semantic gap caused the system to fail to retrieve relevant content.

Lesson: Add a query rewriting layer before retrieval. Use a small model to rewrite user queries into the style of the corpus. The effect was immediate. This technique is extremely low-cost but offers a high ROI.

Pitfall 5: Lack of Monitoring

After launch, we had no usage data. We didn’t know which Q&A pairs failed, what users asked most frequently, or where the system bottlenecks were.

Lesson: Implement tracking from Day 1. Log queries, retrieval results, and user satisfaction (even if it’s just a thumbs up/down). Without data, optimization is like trying to feel an elephant while blindfolded.

Conclusion

RAG is not a technology where you can simply "connect an API and be done." From document cleaning to retrieval optimization, and from latency tuning to user behavior analysis, every stage has its pitfalls. Delivery isn’t about getting a demo to run; it’s about ensuring the system can stably serve real users.

For my next RAG project, I will spend 40% of my time on data quality and monitoring, rather than tweaking model parameters.

A RAG Deployment Failure: Five Pitfalls on the Journey from "It Runs" to "It’s Deliverable"

A RAG Deployment Failure: Five Pitfalls on the Journey from "It Runs" to "It’s Deliverable"

Pitfall 1: Document Formats Are Much Messier Than Expected

Pitfall 2: The "Close Enough" Trap of Vector Search

Pitfall 3: Latency Is Not Always a Model Issue

Pitfall 4: Users Don’t Ask "Good Questions"

Pitfall 5: Lack of Monitoring

Conclusion

Comments

Leave a Comment