Documentation
Core ConceptsDurable Workflows

Durable Workflows

How Fabric AI uses Temporal for fault-tolerant, resumable workflow execution.

Fabric AI uses Temporal to power its workflow engine, ensuring that complex multi-step operations complete reliably—even when things go wrong.

Why Workflows Matter

The Problem with Traditional Execution

Traditional request-response systems have limitations:

Loading diagram...

When generating a PRD that takes 5 minutes, or creating 20 Jira tickets, any interruption means starting over.

The Workflow Solution

Workflows persist state at every step:

Loading diagram...

If anything fails: The workflow resumes from the last saved state.

Loading diagram...

How Temporal Works

Key Concepts

Workflows Long-running, fault-tolerant processes that orchestrate activities.

Activities Individual units of work like API calls, AI generation, or file operations.

Workers Processes that execute workflows and activities.

Signals External events that can modify running workflows.

Queries Read the current state of a running workflow.

Workflow Lifecycle

Loading diagram...

Workflows in Fabric

Document Generation Workflow

When you ask an agent to generate a document, this workflow executes:

Initialize Context

Load user preferences, organization settings, and conversation history.

Retrieve RAG Context

Query Qdrant for relevant document chunks based on the request.

Generate Content

Call the AI model with context to generate the document.

Apply Formatting

Format the output according to document type (PRD, spec, etc.).

Save and Return

Store the document in PostgreSQL and return to the user.

Each step is an activity that:

  • Automatically retries on failure
  • Has configurable timeouts
  • Saves state before and after

Orchestrator Workflow

The Fabric Orchestrator runs a more complex workflow:

Loading diagram...

Reliability Features

Automatic Retries

Activities retry automatically with exponential backoff:

Loading diagram...

Configuration:

  • Initial interval — 1 second
  • Maximum interval — 5 minutes
  • Maximum attempts — 5 (configurable)
  • Backoff coefficient — 2.0

Heartbeats

Long-running activities send heartbeats to indicate they're still alive:

// Example activity with heartbeats
async function generateLargeDocument(input: DocumentInput) {
  for (const section of input.sections) {
    // Process section...

    // Send heartbeat to indicate progress
    heartbeat({ section: section.name, progress: 50 });
  }
  return document;
}

If heartbeats stop, Temporal can retry the activity on a different worker.

Timeouts

Multiple timeout types protect against hanging operations:

Timeout TypePurposeDefault
Start-to-closeMax time for single attempt5 minutes
Schedule-to-closeMax time including retries30 minutes
HeartbeatMax time between heartbeats1 minute
Schedule-to-startMax time in queue10 minutes

Human-in-the-Loop

Workflows can pause for human approval:

How It Works

Loading diagram...

Example: "The workflow wants to delete 50 Jira tickets. Do you want to proceed?"

Signal-Based Approvals

Approvals are implemented using Temporal signals:

// Wait for approval signal
const approval = await condition(
  () => approvalReceived,
  { timeout: '24 hours' }
);

if (approval.approved) {
  // Continue with operation
} else {
  // Skip or fail gracefully
}

Key Features:

  • Workflow pauses indefinitely (or until timeout)
  • State is preserved while waiting
  • User can approve/reject anytime
  • Workflow resumes immediately after signal

Observability

Temporal UI

Access the Temporal UI to monitor workflows:

  • List all workflows — See running, completed, and failed
  • View timeline — Step-by-step execution visualization
  • Inspect state — Current workflow variables
  • Replay — Re-execute failed workflows

Access: http://localhost:8233 (local development)

Workflow Queries

Query running workflow state:

// Get current progress
const progress = await handle.query('progress');
// → { currentStep: 3, totalSteps: 5, status: 'executing' }

// Get generated artifacts
const artifacts = await handle.query('artifacts');
// → [{ type: 'prd', name: 'Authentication PRD' }]

Event History

Every workflow maintains a complete event history:

Loading diagram...

This history enables:

  • Debugging — See exactly what happened
  • Replay — Re-execute with same inputs
  • Audit — Complete compliance trail

Trust-Based Approvals

The Orchestrator learns from your approval patterns:

How It Works

Week 1:

Operation: Post to Slack → Request approval → Approved
Operation: Create Jira ticket → Request approval → Approved
Operation: Post to Slack → Request approval → Approved

Week 2:

Operation: Post to Slack → Auto-approved (you always approve)
Operation: Create Jira ticket → Request approval → Approved

Week 4:

Operation: Post to Slack → Auto-approved
Operation: Create Jira ticket → Auto-approved
Operation: DELETE 100 records → Request approval (always ask for deletes)

Risk Levels

Risk LevelExamplesDefault Behavior
LowRead, list, searchAuto-approve
MediumCreate, updateLearn from patterns
HighBulk operationsUsually request approval
CriticalDelete, financialAlways request approval

Best Practices

Designing for Reliability

Do:

  • Break work into small activities
  • Use idempotent operations when possible
  • Handle partial success gracefully
  • Set appropriate timeouts

Don't:

  • Put too much logic in a single activity
  • Assume network calls will succeed
  • Skip error handling
  • Use infinite timeouts

Monitoring

  • Check Temporal UI regularly for failed workflows
  • Set up alerts for workflow failures
  • Review execution times for optimization
  • Audit approval patterns

Next Steps