Usage Guide

Getting Started

After installation, you can use DataVerse ChatBot in several different ways:

  1. CLI Chat Interface: Interact directly via the command line

  2. Web Interface: Embed or deploy as a web application

  3. Telegram Bot: Deploy as a Telegram bot

  4. WhatsApp Bot: Deploy as a WhatsApp bot via Twilio

  5. Admin Dashboard: Manage the system through a web-based admin panel

Running the CLI Interface

The simplest way to start is with the command-line interface:

python src/main.py

This will:

  1. Load the configured content

  2. Create or load the vector index

  3. Start an interactive chat session

Data Ingestion Methods

Web Crawling

To extract content from a website:

from chatbot.crawler import Crawler
from chatbot.utils.paths import WEB_CONTENT_DIR
import asyncio
import tldextract

url = "https://example.com"
domain_name = tldextract.extract(url).domain

# Create crawler instance (options: "crawl4ai" or "scrapegraph")
crawler = Crawler(url, domain_name, client="crawl4ai")

# Extract content (webpage_only=False to crawl linked pages)
content_path = asyncio.run(crawler.extract_content(
    url, webpage_only=False, max_depth=2



print(f"Content saved to: {content_path}")

File Loading

To extract content from files:

from chatbot.utils.file_loader import FileLoader
from chatbot.utils.paths import WEB_CONTENT_DIR

# Path to your file (supports PDF, DOCX, CSV, XLSX, TXT, PPT, etc.)
file_path = "data/training_files/document.pdf"

# Output path for the content
output_path = WEB_CONTENT_DIR / "extracted_content.txt"

# Create loader (options: "langchain" or "docling")
loader = FileLoader(file_path, output_path, client="docling")

# Extract content
documents = loader.extract_from_file()

if documents:
    print(f"Successfully extracted {len(documents)} documents")

Creating a RAG System

Choose your preferred LLM to create a RAG system:

from chatbot.rag.openai_rag import OpenAIRAG
from chatbot.rag.claude_rag import ClaudeRAG
from chatbot.rag.cohere_rag import CohereRAG
from chatbot.utils.paths import WEB_CONTENT_DIR, INDEXES_DIR

# Path to your content
content_path = WEB_CONTENT_DIR / "example.txt"

# Create RAG instance (example with Claude)
rag = ClaudeRAG(
    content_path,          # Content to use for RAG
    INDEXES_DIR,           # Where to store vector indexes
    model_name="claude-3-5-sonnet-20241022",  # Specific model to use
    chunking_type="recursive",  # Chunking strategy (options: "recursive", "semantic", "basic")
    rerank=True            # Whether to use Cohere's reranking model



# Get a response (asynchronous)
import asyncio
user_id = "user123"        # Used for chat history
query = "Tell me about DataVerse."

response = asyncio.run(rag.get_response(query, user_id))
print(response)

Deploying as a Web Interface

To run the web interface:

python src/web/chat_web_app.py

This starts a FastAPI server on port 5001. You can access the chat interface at: http://localhost:5001/

Deploying as a Telegram Bot

To start the Telegram bot:

python src/tg_bot.py

Ensure you’ve set up the Telegram Bot API token in your .env file:

# Telegram Bot
TELEGRAM_BOT_TOKEN=your_telegram_bot_token

Deploying as a WhatsApp Bot

To deploy as a WhatsApp bot via Twilio:

python src/whatsapp_bot.py

Configure your Twilio credentials in the .env file:

# Twilio for WhatsApp
TWILIO_ACCOUNT_SID=your_twilio_account_sid
TWILIO_AUTH_TOKEN=your_twilio_auth_token
TWILIO_PHONE_NUMBER=your_twilio_phone_number

Running the Admin Dashboard

To launch the admin dashboard:

python src/admin_dashboard_launcher.py

Navigate to: http://localhost:8050/ to access the dashboard.

Default credentials are: - Username: admin - Password: password

(Change these in production!)

Advanced Features

Voice Mode

DataVerse ChatBot supports voice interaction:

from chatbot.voice_mode import VoiceMode

voice_mode = VoiceMode()

# Record audio and transcribe
wav_path = voice_mode.start_recording()
transcribed_text = voice_mode.transcribe(wav_path)

# Convert text response to speech
voice_mode.text_to_speech("This is a spoken response.")

Creating a Custom Dataset

To create a dataset for uncertainty classification:

python src/chatbot/utils/make_dataset.py

Training a Classifier

To train a classifier for uncertainty detection:

python src/chatbot/utils/train_clf.py