Overview
System Architecture
DataVerse ChatBot follows a modular architecture designed with extensibility and maintainability in mind:
┌─────────────────┐ ┌───────────────────┐ ┌─────────────────┐
│ Data Ingestion │ │ Vector Processing │ │ LLM Systems │
│ │ │ │ │ │
│ - Web Crawling │──────▶│ - Embeddings │──────▶│ - RAG Interface │
│ - File Loading │ │ - Vectorization │ │ - LLM Providers │
│ │ │ - FAISS Indexing │ │ │
└─────────────────┘ └───────────────────┘ └─────────────────┘
▲ │
│ ▼
│ ┌───────────────────┐ ┌─────────────────┐
│ │ Chat Interfaces │ │ Monitoring │
└─────────────────│ │◀──────│ │
│ - Telegram Bot │ │ - Uncertainty │
│ - WhatsApp Bot │ │ - Email Alerts │
│ - Web Interface │ │ - Chat History │
└───────────────────┘ └─────────────────┘
Key Components
Data Ingestion
Crawler: Extracts content from websites using configurable crawlers
FileLoader: Processes various file formats into uniform text representations
Vector Processing
BaseEmbedding: Creates vector embeddings from text
CohereEmbedding, OpenAIEmbedding, etc.: Provider-specific embedding implementations
FAISS integration: Efficient similarity search for document retrieval
RAG Systems
BaseRAG: Core Retrieval-Augmented Generation functionality
ClaudeRAG, OpenAIRAG, etc.: LLM-specific implementations
Context management and reranking
Chat Interfaces
Telegram bot for messenger integration
WhatsApp bot via Twilio integration
Web-based chat interface with iframe embedding support
Monitoring & Utilities
Uncertainty detection via trained classifier
Response monitoring and email alerting
Database operations for chat history and usage tracking
Technology Stack
Programming Language: Python 3.11+
Vector Database: FAISS (Facebook AI Similarity Search)
LLM Providers: OpenAI, Anthropic, Google, Mistral, Cohere, DeepSeek, Grok
Web Crawling: Crawl4AI, ScrapegraphAI
Data Processing: LangChain, DocLing, unstructured
Classification: scikit-learn, XGBoost
Embedding Models: sentence-transformers, provider-specific embedding APIs
Web Framework: FastAPI, Flask
Messenger Integrations: python-telegram-bot, Twilio
Database: SQLite
Voice Support: OpenAI Whisper, TTS libraries