PRODUCTION · LIVE AI RAG LLM

Procurement Code
Search Bot

Telegram bot + REST API for searching government procurement classification codes. Built with vector search, RAG architecture, and LLM reranking — high-accuracy results over a 20,000+ item catalog.

20K+
Codes indexed
80%
Top-1 accuracy
3–5s
Avg response
100
Batch API capacity

Problem & Solution

❌ Problem
  • Procurement specialists manually browse a 20,000+ item classifier
  • Commercial product names don't match official classifier terminology
  • Errors lead to rejected bids and legal penalties
  • No existing tool combined classifier search with real procurement registry signals
✅ Solution
  • Natural language search — type product name, get classification codes
  • Hybrid search: vector similarity + full-text + procurement signal via RRF
  • LLM reranking to surface the most relevant code first
  • REST API for direct integration with ERP systems

System Architecture

Telegram Bot
aiogram 3.x
User input
searcher.py
Search pipeline
Core logic
Supabase
PostgreSQL + pgvector
Vector DB
OpenAI
GPT-4o-mini + Embeddings
LLM layer
FastAPI
REST API / ERP
ERP integration
1
Verdict Detection SQL Tool
Parallel registry lookup + full-text search count. Determines path: specific / ambiguous / classifier_search
2
HyDE Query Expansion GPT-4o-mini
Generates a hypothetical classifier-style description to improve embedding quality. Temperature = 0 for consistency.
3
Hybrid Search — RRF Fusion SQL RPC
Reciprocal Rank Fusion combines: vector search (×1.0) + full-text on name+enriched (×2.0) + registry signal (×1.0). k=60, 40 candidates.
4
LLM Reranking GPT-4o-mini
GPT-4o-mini selects top-10 from 40 candidates, filters parent/child duplicates, prioritizes leaf-level codes.

Bot in Action

@procurement_finder_bot — Telegram

Key Features

🎯
Hybrid Vector Search
RRF fusion of vector similarity, full-text search, and procurement registry signal. Handles commercial names, abbreviations, and typos.
Smart Caching
MD5-based query cache in Supabase. Repeated queries return instantly (0.5s vs 3s). Auto-expires after 30 days.
🔌
REST API + Batch
FastAPI endpoint with single and batch search (up to 100 queries). Swagger docs auto-generated. Direct ERP integration.
🧠
Registry Signal Boosting
Uses real procurement registry data as a ranking signal to prioritize codes that are actually used in contracts.
📊
Feedback Loop
Every user selection saved to database. Enables future fine-tuning and quality benchmarking against specialist choices.
🌐
Data Enrichment
Classifier codes enriched with GPT-generated synonyms. Registry data loaded from official XML source (55K+ records).

Metrics & Results

80%
Top-1 accuracy
85%
Category accuracy
~3s
Avg response time
20K+
Codes indexed
0.5s
Cached response
100
Batch API capacity

Built With

Python 3.11 FastAPI OpenAI API Supabase pgvector aiogram 3.x Railway