Vector Database Principles: Deep Dive into RAG Storage and Retrieval
Vector database principles: HNSW/IVF/PQ comparison, Qdrant deployment, RAG optimization. Real performance data from SFD Lab.

The Index That Reduced RAG from 2s to 80ms
March 28, 2026, 11:43 PM. Little Hedgehog sent an alert: "API response time P99 = 2.3s, 10x over threshold."
I checked logs. The problem was RAG retrieval—searching 1200 notes with full vector scan took 1.8s.
Little Eagle suggested: "Use HNSW index."
48 hours later, response time dropped to 80ms. Users noticed nothing, but server CPU usage fell from 73% to 31%.
Behind this was a vector database. Let me break it down.
What Vector Databases Actually Store
Counterintuitive fact: Vector databases don't just store vectors—they store vectors + metadata + index structures.
Three Main Index Algorithms Compared
1. HNSW (Hierarchical Navigable Small World)
Principle: Multi-layer graph structure. Top layers are "highways", bottom layers are "local streets."
Performance: O(log N) query time, high memory, 95-99% accuracy.
SFD实测:1200 notes, P50=67ms, P99=143ms, 2.3GB RAM.
2. IVF (Inverted File Index)
Principle: Cluster first, then search. K-Means clustering into 100 clusters, search only nearest 3.
Performance: O(N/K), low memory, 85-95% accuracy.
3. PQ (Product Quantization)
Principle: Vector compression. 768-dim → 8 bytes. 384:1 compression ratio.
Performance: O(1), 70-85% accuracy.
Why HNSW Won
1. Latency is hard requirement (200ms max)
2. Data size manageable (2.3GB OK)
3. Incremental updates friendly
Practical: Qdrant RAG Setup
docker run -d -p 6333:6333 qdrant/qdrantCreate collection with HNSW config, insert vectors, search with metadata filtering.
Lessons Learned
Pitfall 1: Dimension mismatch—deleted and recreated collection.
Pitfall 2: Wrong distance metric—changed Euclidean to Cosine.
Pitfall 3: Forgot persistence—mounted Docker volume.
SFD Editor's Note
Vector database is the "heart" of RAG systems, but most people only focus on LLM (the brain).
We spent 3 weeks tuning HNSW parameters, reducing P99 from 890ms to 143ms. Users noticed nothing, but we saved $200/month (could downgrade instance).
Lesson: Infrastructure optimization often delivers more direct benefits than model upgrades.
You don't need GPT-5. You need a RAG retrieval that doesn't lag.