Retrieval, RAG, and Vector Search

This page tracks late-interaction retrieval, HNSW, RAG pipelines, video RAG, and retrieval-oriented models.

Sources in this batch

  • HNSW provides a core approximate-nearest-neighbor indexing method.
  • Omar Khattab’s late-interaction material points toward retrieval architectures beyond simple dense vectors.
  • A RAG pipeline video, VideoRAG, and ColModernVBERT connect retrieval to long-context and multimodal workloads.

Research interest

The interesting trend is retrieval becoming modality- and task-specific. VideoRAG and late-interaction models suggest that a single vector per document is often too crude; agents may need retrieval systems that preserve structure, time, and token-level interactions.

Open questions:

  • When does long context beat retrieval, and when does retrieval beat long context?
  • Can late-interaction retrieval be made cheap enough for personal/local agents?
  • How should retrieval expose uncertainty and provenance to downstream agents?