Dokumentation

RAG (Retrieval-Augmented Generation)

How Fabric AI uses your documents to provide contextual, accurate AI responses.

RAG is one of Fabric AI's most powerful features. It allows AI agents to access and use information from your documents, code repositories, and knowledge bases—making responses more accurate, relevant, and grounded in your actual data.

What is RAG?

Retrieval-Augmented Generation is a technique that enhances AI responses by:

  1. Retrieving relevant information from your documents
  2. Augmenting the AI's context with that information
  3. Generating responses that reference your specific data

Instead of relying solely on the AI model's training data, RAG allows the model to access your proprietary information in real-time.

┌─────────────────────────────────────────────────────────────────┐
│                        User Question                             │
│            "How does our authentication system work?"            │
└───────────────────────────────┬─────────────────────────────────┘


┌─────────────────────────────────────────────────────────────────┐
│                      Query Processing                            │
│  1. Convert question to vector embedding                         │
│  2. Search for semantically similar document chunks              │
└───────────────────────────────┬─────────────────────────────────┘


┌─────────────────────────────────────────────────────────────────┐
│                     Qdrant Vector Search                         │
│  Found chunks from:                                              │
│  • auth-architecture.md (0.92 similarity)                        │
│  • security-guidelines.pdf (0.87 similarity)                     │
│  • api-docs/auth.md (0.84 similarity)                           │
└───────────────────────────────┬─────────────────────────────────┘


┌─────────────────────────────────────────────────────────────────┐
│                    Context Assembly                              │
│  Combine retrieved chunks into a coherent context window         │
└───────────────────────────────┬─────────────────────────────────┘


┌─────────────────────────────────────────────────────────────────┐
│                      AI Generation                               │
│  Generate response using both the question AND retrieved context │
└───────────────────────────────┬─────────────────────────────────┘


┌─────────────────────────────────────────────────────────────────┐
│                      Response                                    │
│  "Based on your auth-architecture.md document, your system      │
│   uses JWT tokens with OAuth 2.0 for authentication..."         │
└─────────────────────────────────────────────────────────────────┘

Why Use RAG?

Without RAG

  • AI responds based on general knowledge
  • May hallucinate or make up details
  • Can't reference your specific systems
  • Answers may be outdated

With RAG

  • AI references your actual documentation
  • Responses grounded in verified information
  • Understands your specific terminology
  • Always uses latest uploaded documents

How RAG Works in Fabric

1. Document Ingestion

When you upload documents to a workspace, Fabric processes them through a pipeline:

Upload

Drag and drop files into a workspace. Supported formats:

  • PDF — Including scanned documents (with OCR)
  • Word — .docx files
  • Markdown — .md files
  • Text — .txt files
  • Code — Various programming languages

Extraction

Text is extracted using the appropriate extractor:

  • Local extractors — Fast, free processing for standard formats
  • Unstructured.io — Advanced OCR for scanned documents and images
  • LlamaParse — Optimized for code documentation

Chunking

Documents are split into semantic chunks:

  • Chunk size — ~512 tokens per chunk (configurable)
  • Overlap — 50 tokens overlap between chunks
  • Semantic boundaries — Respects paragraphs and sections

Embedding

Each chunk is converted to a vector embedding:

  • Model — OpenAI text-embedding-3-small (1536 dimensions)
  • Consistent encoding — Same model for queries and documents

Storage

Embeddings are stored in Qdrant with metadata:

  • Vector — The 1536-dimensional embedding
  • Metadata — Filename, chunk position, tenant IDs
  • Multi-tenancy — Isolated by userId and organizationId

2. Retrieval

When you ask a question or start generating a document:

Query Embedding

Your question is converted to a vector using the same embedding model.

Qdrant finds the most similar document chunks:

  • Cosine similarity — Measures semantic closeness
  • Top-K retrieval — Returns top 5-10 most relevant chunks
  • Tenant filtering — Only searches your documents

Re-ranking

Results are re-ranked for relevance:

  • Recency boost — Newer documents weighted higher
  • Source diversity — Mix chunks from different documents
  • Relevance threshold — Filter out low-similarity chunks

3. Generation

Retrieved context is provided to the AI:

┌──────────────────────────────────────────────────────────────┐
│                     AI Prompt Structure                       │
├──────────────────────────────────────────────────────────────┤
│  System: You are a helpful assistant with access to          │
│          the user's documentation.                           │
│                                                               │
│  Context:                                                     │
│  [Chunk 1: auth-architecture.md - paragraphs about JWT...]   │
│  [Chunk 2: security-guidelines.pdf - OAuth flow...]          │
│  [Chunk 3: api-docs/auth.md - endpoint documentation...]     │
│                                                               │
│  User: How does our authentication system work?               │
└──────────────────────────────────────────────────────────────┘

The AI generates a response that synthesizes information from all relevant chunks.

Using RAG in Fabric

Creating a Workspace

Click Workspaces in the left sidebar.

Create New Workspace

Click Create Workspace and give it a descriptive name like "Product Documentation" or "Engineering Specs".

Upload Documents

Drag and drop files or click to browse. You can upload multiple files at once.

Wait for Processing

Documents are processed in the background. Large documents may take a few minutes. You'll see a progress indicator.

Attaching Workspaces to Agents

When starting a conversation with an agent:

  1. Click the Attach button in the chat interface
  2. Select one or more workspaces
  3. The agent now has access to all documents in those workspaces

Tip: You can attach different workspaces for different types of questions. For example:

  • Attach "Product Docs" when generating PRDs
  • Attach "Technical Specs" when generating architecture documents
  • Attach "Code Repos" when generating API documentation

Workspaces in Projects

Projects can have dedicated workspaces:

  1. Go to Projects → Your Project
  2. Click Documents tab
  3. Upload project-specific documents
  4. All agents working on this project automatically have access

Best Practices

Document Preparation

Do:

  • Use clear, well-structured documents
  • Include headings and sections
  • Keep documents focused on single topics
  • Update documents when information changes

Don't:

  • Upload massive files with mixed topics
  • Include sensitive data without consideration
  • Upload duplicate content
  • Leave outdated documents in workspaces

Chunk Size Optimization

Content TypeRecommended Chunk Size
Technical docs512 tokens
Code files256 tokens
Long-form content768 tokens
Q&A format128 tokens

Query Optimization

For best results:

  • Ask specific questions
  • Include relevant keywords
  • Reference document names when known
  • Break complex questions into parts

Multi-Tenancy and Security

RAG in Fabric is fully multi-tenant:

Data Isolation

  • Organization boundaries — Documents are isolated by organization
  • User scoping — Personal workspaces are private
  • No cross-contamination — Queries never return other tenants' data

Access Control

  • Workspace permissions — Control who can view/edit
  • Role-based access — Admins can manage all workspaces
  • Audit logging — Track document access

Data Security

  • Encrypted storage — Vectors encrypted at rest
  • Secure transmission — TLS for all transfers
  • Tenant isolation — Separate vector namespaces

Technical Details

Embedding Model

Fabric uses OpenAI's text-embedding-3-small model:

  • Dimensions — 1536
  • Context window — 8191 tokens
  • Quality — High semantic understanding
  • Cost — Efficient for high-volume usage

Vector Database

Qdrant provides the vector storage:

  • Similarity metric — Cosine similarity
  • Filtering — Metadata-based filtering for multi-tenancy
  • Scalability — Handles millions of vectors
  • Performance — Sub-millisecond query times

Chunking Strategy

Document: "Introduction to Authentication\n\nOur system uses JWT tokens..."

Chunk 1: "Introduction to Authentication. Our system uses JWT tokens..."
         │─────────────── 512 tokens ────────────────│

Chunk 2: "...tokens for session management. The OAuth 2.0 flow..."
         │───── 50 token overlap ─────│
                                      │─── 512 tokens ───│

Troubleshooting

"No relevant documents found"

  • Check that documents are fully processed
  • Verify the workspace is attached
  • Try rephrasing your question
  • Ensure documents contain relevant information

"Irrelevant context retrieved"

  • Documents may be too general
  • Try uploading more specific documents
  • Use workspace filters to narrow scope
  • Adjust query to be more specific

"Processing taking too long"

  • Large PDFs may take several minutes
  • Scanned documents require OCR
  • Check document isn't corrupted
  • Try splitting into smaller files

Next Steps