# Scalable RAG Architecture (Backend)

This backend now includes a multi-tenant RAG stack with memory and profile learning.

## What was added

- `rag_handler.py`
  - Creates and manages RAG tables automatically.
  - Ingests pre-chunked, pre-embedded documents.
  - Upserts vectors to Qdrant (Qdrant-first), with MySQL vector fallback.
  - Retrieves top-k chunks from Qdrant when available.
  - Uses Redis embedding/retrieval cache when available.
  - Stores conversation history per `bid + conversation_id + user_id`.
  - Maintains a lightweight user profile that updates from every query.
  - Generates responses through Bedrock Nova when configured.

- `app.py` endpoints
  - `POST /rag/<bid>/documents`
  - `POST /rag/<bid>/ingest-transcripts`
  - `POST /rag/<bid>/query`
  - `GET /rag/<bid>/conversations/<conversation_id>`
  - `GET /rag/<bid>/profiles/<user_id>`

- `config.py` RAG config
  - `RAG_CHAT_MODEL`, `RAG_EMBEDDING_MODEL`
  - `RAG_TOP_K`, `RAG_SIMILARITY_THRESHOLD`, `RAG_MEMORY_MESSAGES`
  - `RAG_MAX_TOKENS`, `RAG_TEMPERATURE`, `RAG_TOP_P`

## Data model

- `rag_documents`: source-level metadata
- `rag_chunks`: text chunks + embedding vectors + norm
- `rag_conversations`: conversation session metadata
- `rag_messages`: user/assistant/system messages and retrieval trace
- `rag_user_profiles`: learned user features (interests, traits, recent topics)

## Ingestion payload example

```json
{
  "documents": [
    {
      "source_id": "kb-pricing-2026",
      "title": "Pricing FAQ",
      "source_type": "faq",
      "source_uri": "https://example.com/faq/pricing",
      "metadata": {"lang": "en"},
      "chunks": [
        {
          "chunk_id": "kb-pricing-2026-1",
          "text": "Enterprise plan includes...",
          "embedding": [0.011, -0.044, 0.203],
          "metadata": {"section": "enterprise"}
        }
      ]
    }
  ]
}
```

## Query payload example

```json
{
  "user_id": "cust-9087",
  "conversation_id": "optional-thread-id",
  "message": "Which plan is best for 200 agents?",
  "query_embedding": [0.024, -0.031, 0.188],
  "top_k": 8,
  "min_similarity": 0.2,
  "profile_updates": {
    "traits": {
      "company_size": "200-500",
      "industry": "retail"
    }
  }
}
```

## Scale notes

- Current implementation is Qdrant-first with MySQL fallback.
- If Qdrant is unavailable, retrieval falls back to MySQL cosine scoring.
- Keep conversation and profile tables in MySQL for transactional consistency.
- For high throughput ingestion, use batch workers and queue ingestion jobs.