Dossier Live RAG System
A production-ready, open-source Live RAG (Retrieval-Augmented Generation) system designed specifically for Frappe documents. Dossier provides real-time document ingestion, intelligent chunking, semantic search, and natural language Q&A capabilities through a modern chat interface.
Quick Start
Get Dossier running in minutes:
# Clone the repository
git clone https://github.com/your-org/dossier.git
cd dossier
# Copy and configure environment
cp .env.example .env
# Edit .env with your Frappe instance details
# Start the complete system
make quick-start
# Access the chat interface
open http://localhost:3000Architecture Overview
Dossier is built as a microservices architecture with clear separation of concerns:
Core Services
- 🔗 Webhook Handler (Node.js) - Receives and validates Frappe webhooks
- 📄 Ingestion Service (Python) - Processes documents and manages workflows
- 🧠 Embedding Service (Python) - Generates vector embeddings using BGE-small
- 🔍 Query Service (Python) - Handles semantic search and retrieval
- 🤖 LLM Service (Python) - Generates natural language responses using Ollama
- 🌐 API Gateway (Python) - Authentication, rate limiting, and request routing
- ⚛️ Frontend (React) - Modern chat interface with real-time streaming
Infrastructure Components
- 🐘 PostgreSQL - Configuration and metadata storage
- 🔴 Redis - Message queuing and caching
- 🎯 Qdrant - Vector database for semantic search
- 🦙 Ollama - Local LLM inference engine
Key Features
🚀 Live Document Synchronization
- Real-time webhook processing with HMAC signature validation
- Automatic document ingestion with exponential backoff retry
- Dead letter queue for failed processing and manual review
- Support for multiple Frappe doctypes with custom field mapping
🧩 Intelligent Text Processing
- Semantic chunking with configurable size and overlap
- Metadata preservation during document processing
- Batch processing for optimal performance
- Graceful handling of various document formats
🔍 Advanced Search & Retrieval
- Vector similarity search with sub-2-second response times
- Metadata filtering and contextual relevance scoring
- Top-k retrieval with configurable parameters
- Source highlighting and citation tracking
💬 Natural Language Interface
- Streaming responses with real-time user feedback
- Context injection from retrieved document chunks
- Conversation memory and follow-up question handling
- Fallback responses for edge cases and errors
🛡️ Production-Grade Security
- JWT authentication with configurable token expiration
- Rate limiting to prevent API abuse (100 req/min default)
- CORS configuration for secure frontend integration
- Input validation and sanitization across all endpoints
📊 Monitoring & Observability
- Health checks on all service endpoints (
/health) - Prometheus metrics collection (
/metrics) - Structured JSON logging with correlation IDs
- Distributed tracing for request flow tracking
Performance Characteristics
| Metric | Performance |
|---|---|
| Query Response Time | < 2 seconds |
| LLM Response Time | < 30 seconds |
| Embedding Generation | 20+ texts/second |
| Concurrent Users | 50+ supported |
| Memory Usage | < 16GB total system |
| Storage Efficiency | ~1GB per 10K documents |
System Requirements
Minimum Requirements
- CPU: 4 cores
- RAM: 8GB
- Storage: 50GB free space
- Network: Stable internet connection
Recommended for Production
- CPU: 8+ cores
- RAM: 16GB+
- Storage: 100GB+ SSD
- Network: High-speed connection
Next Steps
Community & Support
- 📖 Documentation: Complete guides and API references
- 🐛 Issues: GitHub Issues (opens in a new tab) for bug reports
- 💬 Discussions: GitHub Discussions (opens in a new tab) for questions
- **🤝 Contributing