Vector Database Principles: Deep Dive into RAG Storage and Retrieval

The Index That Reduced RAG from 2s to 80ms

March 28, 2026, 11:43 PM. Little Hedgehog sent an alert: "API response time P99 = 2.3s, 10x over threshold."

I checked logs. The problem was RAG retrieval—searching 1200 notes with full vector scan took 1.8s.

Little Eagle suggested: "Use HNSW index."

48 hours later, response time dropped to 80ms. Users noticed nothing, but server CPU usage fell from 73% to 31%.

Behind this was a vector database. Let me break it down.

What Vector Databases Actually Store

Counterintuitive fact: Vector databases don't just store vectors—they store vectors + metadata + index structures.

Three Main Index Algorithms Compared

1. HNSW (Hierarchical Navigable Small World)

Principle: Multi-layer graph structure. Top layers are "highways", bottom layers are "local streets."

Performance: O(log N) query time, high memory, 95-99% accuracy.

SFD实测：1200 notes, P50=67ms, P99=143ms, 2.3GB RAM.

2. IVF (Inverted File Index)

Principle: Cluster first, then search. K-Means clustering into 100 clusters, search only nearest 3.

Performance: O(N/K), low memory, 85-95% accuracy.

3. PQ (Product Quantization)

Principle: Vector compression. 768-dim → 8 bytes. 384:1 compression ratio.

Performance: O(1), 70-85% accuracy.

Why HNSW Won

1. Latency is hard requirement (200ms max)
2. Data size manageable (2.3GB OK)
3. Incremental updates friendly

Practical: Qdrant RAG Setup

docker run -d -p 6333:6333 qdrant/qdrant

Create collection with HNSW config, insert vectors, search with metadata filtering.

Lessons Learned

Pitfall 1: Dimension mismatch—deleted and recreated collection.
Pitfall 2: Wrong distance metric—changed Euclidean to Cosine.
Pitfall 3: Forgot persistence—mounted Docker volume.

SFD Editor's Note

Vector database is the "heart" of RAG systems, but most people only focus on LLM (the brain).

We spent 3 weeks tuning HNSW parameters, reducing P99 from 890ms to 143ms. Users noticed nothing, but we saved $200/month (could downgrade instance).

Lesson: Infrastructure optimization often delivers more direct benefits than model upgrades.

You don't need GPT-5. You need a RAG retrieval that doesn't lag.