Usage Guide
Getting Started
After installation, you can use DataVerse ChatBot in several different ways:
CLI Chat Interface: Interact directly via the command line
Web Interface: Embed or deploy as a web application
Telegram Bot: Deploy as a Telegram bot
WhatsApp Bot: Deploy as a WhatsApp bot via Twilio
Admin Dashboard: Manage the system through a web-based admin panel
Running the CLI Interface
The simplest way to start is with the command-line interface:
python src/main.py
This will:
Load the configured content
Create or load the vector index
Start an interactive chat session
Data Ingestion Methods
Web Crawling
To extract content from a website:
from chatbot.crawler import Crawler
from chatbot.utils.paths import WEB_CONTENT_DIR
import asyncio
import tldextract
url = "https://example.com"
domain_name = tldextract.extract(url).domain
# Create crawler instance (options: "crawl4ai" or "scrapegraph")
crawler = Crawler(url, domain_name, client="crawl4ai")
# Extract content (webpage_only=False to crawl linked pages)
content_path = asyncio.run(crawler.extract_content(
url, webpage_only=False, max_depth=2
print(f"Content saved to: {content_path}")
File Loading
To extract content from files:
from chatbot.utils.file_loader import FileLoader
from chatbot.utils.paths import WEB_CONTENT_DIR
# Path to your file (supports PDF, DOCX, CSV, XLSX, TXT, PPT, etc.)
file_path = "data/training_files/document.pdf"
# Output path for the content
output_path = WEB_CONTENT_DIR / "extracted_content.txt"
# Create loader (options: "langchain" or "docling")
loader = FileLoader(file_path, output_path, client="docling")
# Extract content
documents = loader.extract_from_file()
if documents:
print(f"Successfully extracted {len(documents)} documents")
Creating a RAG System
Choose your preferred LLM to create a RAG system:
from chatbot.rag.openai_rag import OpenAIRAG
from chatbot.rag.claude_rag import ClaudeRAG
from chatbot.rag.cohere_rag import CohereRAG
from chatbot.utils.paths import WEB_CONTENT_DIR, INDEXES_DIR
# Path to your content
content_path = WEB_CONTENT_DIR / "example.txt"
# Create RAG instance (example with Claude)
rag = ClaudeRAG(
content_path, # Content to use for RAG
INDEXES_DIR, # Where to store vector indexes
model_name="claude-3-5-sonnet-20241022", # Specific model to use
chunking_type="recursive", # Chunking strategy (options: "recursive", "semantic", "basic")
rerank=True # Whether to use Cohere's reranking model
# Get a response (asynchronous)
import asyncio
user_id = "user123" # Used for chat history
query = "Tell me about DataVerse."
response = asyncio.run(rag.get_response(query, user_id))
print(response)
Deploying as a Web Interface
To run the web interface:
python src/web/chat_web_app.py
This starts a FastAPI server on port 5001. You can access the chat interface at: http://localhost:5001/
Deploying as a Telegram Bot
To start the Telegram bot:
python src/tg_bot.py
Ensure you’ve set up the Telegram Bot API token in your .env file:
# Telegram Bot
TELEGRAM_BOT_TOKEN=your_telegram_bot_token
Deploying as a WhatsApp Bot
To deploy as a WhatsApp bot via Twilio:
python src/whatsapp_bot.py
Configure your Twilio credentials in the .env file:
# Twilio for WhatsApp
TWILIO_ACCOUNT_SID=your_twilio_account_sid
TWILIO_AUTH_TOKEN=your_twilio_auth_token
TWILIO_PHONE_NUMBER=your_twilio_phone_number
Running the Admin Dashboard
To launch the admin dashboard:
python src/admin_dashboard_launcher.py
Navigate to: http://localhost:8050/ to access the dashboard.
Default credentials are: - Username: admin - Password: password
(Change these in production!)
Advanced Features
Voice Mode
DataVerse ChatBot supports voice interaction:
from chatbot.voice_mode import VoiceMode
voice_mode = VoiceMode()
# Record audio and transcribe
wav_path = voice_mode.start_recording()
transcribed_text = voice_mode.transcribe(wav_path)
# Convert text response to speech
voice_mode.text_to_speech("This is a spoken response.")
Creating a Custom Dataset
To create a dataset for uncertainty classification:
python src/chatbot/utils/make_dataset.py
Training a Classifier
To train a classifier for uncertainty detection:
python src/chatbot/utils/train_clf.py