How to Implement Semantic Search
Search that understands meaning - the bi-encoder hybrid and reranking architecture that beats keyword search.
Keyword search (BM25/Elasticsearch) is fast and reliable but literal. It cannot find relevant documents when the users query and the document use different words for the same concept. Semantic search solves this - but naive implementation trades precision for recall in ways that frustrate users. This guide covers the hybrid architecture that captures the best of both.
No fluff. Production-grade answers from engineers who ship AI into real products.
Bi-Encoder vs Cross-Encoder: The Architecture Tradeoff
Bi-encoder: encode query and documents independently compare embeddings with cosine similarity. Fast: documents can be pre-encoded and indexed. Suitable for retrieval over large corpora. Lower precision than cross-encoder. Cross-encoder: encode query and document together predict relevance score. Much higher precision much slower. Not suitable for first-stage retrieval over large corpora. Production pattern: bi-encoder for retrieval (top-100 candidates) cross-encoder for reranking (final top-10).
At Valletta Software, we focus on:
Bi-encoder retrieval: SBERT sentence-transformers for encoding - pre-encode all documents at index time
Hybrid search: combine BM25 keyword score and vector similarity score - neither alone is best
Reciprocal Rank Fusion: merge BM25 and vector retrieval result lists - simple effective no training needed
Cross-encoder reranking: Cohere Rerank or ms-marco-MiniLM - rerank top-100 to final top-10
Query expansion: LLM-generated query variants - retrieve with multiple queries combine results
Sparse and dense: SPLADE for sparse neural retrieval - better than BM25 for some domains
Evaluation: NDCG MRR Recall@K on a golden query set - measure before and after every change
The Indexing and Serving Architecture That Scales
Search must be fast. Slow search is abandoned search.
We give you more than just people. We give you top performers who drive results.
Build RAG pipelines, agents, and LLM integrations from day one
Ship AI features 3x faster with AI-native tooling and methodology
Deploy to production - not just Jupyter notebooks and prototypes
Evaluate output quality - hallucination detection, cost optimization, monitoring
How to Implement Semantic Search - With Engineers Who Build It in Production
Forget the hype. We make AI work in the real world.
Our engineers are trained in the latest AI tooling - Copilot, Claude Code, Cursor, LangChain, and vector databases - and use them daily to ship production AI features, not just prototypes.
Choose from a solo dev, mini team, or full squad. All powered by AI and ready to build from day one.
Lets keep it simple.
Our AI engineers build semantic search with Elasticsearch hybrid (BM25 plus dense vector) bi-encoder retrieval Cohere reranking and NDCG evaluation on golden query sets.
Ready to Ship AI into Production? Lets Build It.
Our AI engineers have done this before - RAG pipelines, LLM integrations, agents, MLOps. On real products, under real deadlines.
Rates from EUR 45/h • Free consultation • No commitment required • Response within 24 hours