RAG (Retrieval-Augmented Generation)
How Fabric AI uses your documents to provide contextual, accurate AI responses.
RAG is one of Fabric AI's most powerful features. It allows AI agents to access and use information from your documents, code repositories, and knowledge bases—making responses more accurate, relevant, and grounded in your actual data.
What is RAG?
Retrieval-Augmented Generation is a technique that enhances AI responses by:
- Retrieving relevant information from your documents
- Augmenting the AI's context with that information
- Generating responses that reference your specific data
Instead of relying solely on the AI model's training data, RAG allows the model to access your proprietary information in real-time.
┌─────────────────────────────────────────────────────────────────┐
│ User Question │
│ "How does our authentication system work?" │
└───────────────────────────────┬─────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ Query Processing │
│ 1. Convert question to vector embedding │
│ 2. Search for semantically similar document chunks │
└───────────────────────────────┬─────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ Qdrant Vector Search │
│ Found chunks from: │
│ • auth-architecture.md (0.92 similarity) │
│ • security-guidelines.pdf (0.87 similarity) │
│ • api-docs/auth.md (0.84 similarity) │
└───────────────────────────────┬─────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ Context Assembly │
│ Combine retrieved chunks into a coherent context window │
└───────────────────────────────┬─────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ AI Generation │
│ Generate response using both the question AND retrieved context │
└───────────────────────────────┬─────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ Response │
│ "Based on your auth-architecture.md document, your system │
│ uses JWT tokens with OAuth 2.0 for authentication..." │
└─────────────────────────────────────────────────────────────────┘Why Use RAG?
Without RAG
- AI responds based on general knowledge
- May hallucinate or make up details
- Can't reference your specific systems
- Answers may be outdated
With RAG
- AI references your actual documentation
- Responses grounded in verified information
- Understands your specific terminology
- Always uses latest uploaded documents
How RAG Works in Fabric
1. Document Ingestion
When you upload documents to a workspace, Fabric processes them through a pipeline:
Upload
Drag and drop files into a workspace. Supported formats:
- PDF — Including scanned documents (with OCR)
- Word — .docx files
- Markdown — .md files
- Text — .txt files
- Code — Various programming languages
Extraction
Text is extracted using the appropriate extractor:
- Local extractors — Fast, free processing for standard formats
- Unstructured.io — Advanced OCR for scanned documents and images
- LlamaParse — Optimized for code documentation
Chunking
Documents are split into semantic chunks:
- Chunk size — ~512 tokens per chunk (configurable)
- Overlap — 50 tokens overlap between chunks
- Semantic boundaries — Respects paragraphs and sections
Embedding
Each chunk is converted to a vector embedding:
- Model — OpenAI text-embedding-3-small (1536 dimensions)
- Consistent encoding — Same model for queries and documents
Storage
Embeddings are stored in Qdrant with metadata:
- Vector — The 1536-dimensional embedding
- Metadata — Filename, chunk position, tenant IDs
- Multi-tenancy — Isolated by userId and organizationId
2. Retrieval
When you ask a question or start generating a document:
Query Embedding
Your question is converted to a vector using the same embedding model.
Similarity Search
Qdrant finds the most similar document chunks:
- Cosine similarity — Measures semantic closeness
- Top-K retrieval — Returns top 5-10 most relevant chunks
- Tenant filtering — Only searches your documents
Re-ranking
Results are re-ranked for relevance:
- Recency boost — Newer documents weighted higher
- Source diversity — Mix chunks from different documents
- Relevance threshold — Filter out low-similarity chunks
3. Generation
Retrieved context is provided to the AI:
┌──────────────────────────────────────────────────────────────┐
│ AI Prompt Structure │
├──────────────────────────────────────────────────────────────┤
│ System: You are a helpful assistant with access to │
│ the user's documentation. │
│ │
│ Context: │
│ [Chunk 1: auth-architecture.md - paragraphs about JWT...] │
│ [Chunk 2: security-guidelines.pdf - OAuth flow...] │
│ [Chunk 3: api-docs/auth.md - endpoint documentation...] │
│ │
│ User: How does our authentication system work? │
└──────────────────────────────────────────────────────────────┘The AI generates a response that synthesizes information from all relevant chunks.
Using RAG in Fabric
Creating a Workspace
Navigate to Workspaces
Click Workspaces in the left sidebar.
Create New Workspace
Click Create Workspace and give it a descriptive name like "Product Documentation" or "Engineering Specs".
Upload Documents
Drag and drop files or click to browse. You can upload multiple files at once.
Wait for Processing
Documents are processed in the background. Large documents may take a few minutes. You'll see a progress indicator.
Attaching Workspaces to Agents
When starting a conversation with an agent:
- Click the Attach button in the chat interface
- Select one or more workspaces
- The agent now has access to all documents in those workspaces
Tip: You can attach different workspaces for different types of questions. For example:
- Attach "Product Docs" when generating PRDs
- Attach "Technical Specs" when generating architecture documents
- Attach "Code Repos" when generating API documentation
Workspaces in Projects
Projects can have dedicated workspaces:
- Go to Projects → Your Project
- Click Documents tab
- Upload project-specific documents
- All agents working on this project automatically have access
Best Practices
Document Preparation
Do:
- Use clear, well-structured documents
- Include headings and sections
- Keep documents focused on single topics
- Update documents when information changes
Don't:
- Upload massive files with mixed topics
- Include sensitive data without consideration
- Upload duplicate content
- Leave outdated documents in workspaces
Chunk Size Optimization
| Content Type | Recommended Chunk Size |
|---|---|
| Technical docs | 512 tokens |
| Code files | 256 tokens |
| Long-form content | 768 tokens |
| Q&A format | 128 tokens |
Query Optimization
For best results:
- Ask specific questions
- Include relevant keywords
- Reference document names when known
- Break complex questions into parts
Multi-Tenancy and Security
RAG in Fabric is fully multi-tenant:
Data Isolation
- Organization boundaries — Documents are isolated by organization
- User scoping — Personal workspaces are private
- No cross-contamination — Queries never return other tenants' data
Access Control
- Workspace permissions — Control who can view/edit
- Role-based access — Admins can manage all workspaces
- Audit logging — Track document access
Data Security
- Encrypted storage — Vectors encrypted at rest
- Secure transmission — TLS for all transfers
- Tenant isolation — Separate vector namespaces
Technical Details
Embedding Model
Fabric uses OpenAI's text-embedding-3-small model:
- Dimensions — 1536
- Context window — 8191 tokens
- Quality — High semantic understanding
- Cost — Efficient for high-volume usage
Vector Database
Qdrant provides the vector storage:
- Similarity metric — Cosine similarity
- Filtering — Metadata-based filtering for multi-tenancy
- Scalability — Handles millions of vectors
- Performance — Sub-millisecond query times
Chunking Strategy
Document: "Introduction to Authentication\n\nOur system uses JWT tokens..."
Chunk 1: "Introduction to Authentication. Our system uses JWT tokens..."
│─────────────── 512 tokens ────────────────│
Chunk 2: "...tokens for session management. The OAuth 2.0 flow..."
│───── 50 token overlap ─────│
│─── 512 tokens ───│Troubleshooting
"No relevant documents found"
- Check that documents are fully processed
- Verify the workspace is attached
- Try rephrasing your question
- Ensure documents contain relevant information
"Irrelevant context retrieved"
- Documents may be too general
- Try uploading more specific documents
- Use workspace filters to narrow scope
- Adjust query to be more specific
"Processing taking too long"
- Large PDFs may take several minutes
- Scanned documents require OCR
- Check document isn't corrupted
- Try splitting into smaller files