Skip to content

Real-Time Streaming

Scrapalot streams AI responses in real-time so you see answers as they're generated, just like ChatGPT.

How Streaming Works

When you ask a question, Scrapalot:

  1. Immediately shows you status updates
  2. Streams the answer as it's generated
  3. Provides citations in real-time
  4. Notifies when complete

Why streaming matters:

  • No waiting for full response
  • See progress as it happens
  • Cancel if going wrong direction
  • Better user experience

What You See

Status Updates

During processing, you see:

  • "Analyzing query for optimal strategy..."
  • "Selected strategy: Tri-Modal Fusion"
  • "Retrieving relevant documents..."
  • "Reranking results by relevance..."
  • "Generating response..."

Progress indicators:

  • Visual progress bars
  • Stage indicators
  • Time estimates
  • Current operation

Answer Streaming

Answer appears word-by-word:

  • Immediate feedback
  • Smooth, natural flow
  • Can start reading before complete
  • Cancel if needed

Citations

Sources appear as mentioned:

  • Document name
  • Page number
  • Relevance score
  • Link to view source

Completion

When done:

  • Final status message
  • Total time taken
  • Token count (if applicable)
  • Option to follow up

Connection Types

For web interface:

  • Real-time bidirectional communication
  • Low latency
  • Automatic reconnection
  • Efficient resource usage

Protocol: STOMP over WebSocket

Connection:

ws://localhost:8090/ws

Server-Sent Events (SSE)

For API clients:

  • Simple one-way streaming
  • HTTP-based
  • Easy to implement
  • Firewall-friendly

Endpoint:

POST /api/v1/chat/stream

Message Types

Status Messages

What they tell you:

  • Current processing stage
  • Selected strategy
  • Progress percentage
  • Estimated completion time

Stages:

  • Routing (choosing best approach)
  • Retrieval (finding documents)
  • Processing (enhancing query/results)
  • Generation (creating answer)

Content Messages

The actual answer:

  • Streamed incrementally
  • Markdown formatted
  • With inline citations
  • Proper formatting preserved

Citation Messages

Source information:

  • Document title
  • Page/section reference
  • Relevance score
  • Document ID for retrieval

Error Messages

If something goes wrong:

  • Clear error description
  • Error code for reference
  • Suggested action
  • Option to retry

Completion Messages

End of stream:

  • Final status
  • Total processing time
  • Token usage (if applicable)
  • Completion reason (success/error/cancelled)

User Experience

Visual Feedback

What users see:

Phase 1: Query Analysis

  • "Analyzing query..."
  • Strategy selection
  • Reasoning display (if enabled)

Phase 2: Document Retrieval

  • "Searching documents..."
  • Progress indicator
  • Number of results found

Phase 3: Answer Generation

  • Streaming text appearing
  • Citations appearing inline
  • Formatted properly

Phase 4: Complete

  • Checkmark or completion icon
  • Total time
  • Option to continue conversation

Cancellation

User can cancel anytime:

  • Click stop button
  • Stream stops immediately
  • Partial answer preserved
  • Can start new query

Performance

Streaming Speed

Typical performance:

  • Status updates: < 100ms
  • First token: 1-3 seconds
  • Stream rate: 20-50 tokens/second
  • Complete response: 5-30 seconds

Factors affecting speed:

  • AI model selected
  • Document collection size
  • Query complexity
  • Network latency
  • Server load

Optimization

For best performance:

  • Use WebSocket for web clients
  • Enable response caching
  • Choose faster AI models when appropriate
  • Optimize chunk sizes
  • Use appropriate RAG strategy

Error Handling

Connection Issues

If connection drops:

  • Automatic reconnection attempt
  • Resume from last state
  • Preserve partial answer
  • Notify user of reconnection

User action:

  • Usually automatic recovery
  • Manual retry if needed
  • Check network status

Processing Errors

If query processing fails:

  • Clear error message
  • Suggested action
  • Option to retry
  • Previous context preserved

Common errors:

  • No documents found
  • AI model unavailable
  • Rate limit exceeded
  • Invalid query format

Timeout Handling

If processing takes too long:

  • Automatic timeout (configurable)
  • Partial results returned (if available)
  • Option to extend timeout
  • Clear timeout message

Best Practices

For Users

Optimal experience:

  • Stable internet connection
  • Modern web browser
  • Enable JavaScript
  • Allow WebSocket connections

If streaming slow:

  • Check network speed
  • Try simpler query
  • Select faster AI model
  • Reduce collection size

For Developers

Implementing streaming:

  • Use WebSocket for web apps
  • Implement reconnection logic
  • Handle partial responses
  • Show progress feedback
  • Enable cancellation

Error handling:

  • Parse error messages
  • Implement retry logic
  • Preserve user context
  • Provide clear feedback

Troubleshooting

No Streaming, Only Full Response

Check:

  • WebSocket connection enabled
  • Browser supports WebSocket
  • Firewall allows WebSocket
  • Using correct endpoint

Fix:

  • Enable WebSocket in settings
  • Update browser
  • Check firewall rules
  • Verify connection URL

Delayed Streaming

Possible causes:

  • Slow AI model
  • Large document collection
  • Complex query
  • Network latency
  • Server under load

Solutions:

  • Use faster AI model
  • Optimize document chunking
  • Simplify query
  • Check network connection
  • Scale server resources

Connection Drops

If frequently disconnecting:

  • Check network stability
  • Verify WebSocket support
  • Review firewall settings
  • Check server logs
  • Consider load balancer timeout

Mitigation:

  • Implement auto-reconnect
  • Use connection keep-alive
  • Adjust timeout settings
  • Monitor connection health

Advanced Features

Progress Tracking

Detailed progress information:

  • Current stage (routing, retrieval, generation)
  • Percentage complete (when available)
  • Estimated time remaining
  • Current operation

Reasoning Display

See AI's thought process:

  • Strategy selection reasoning
  • Query analysis
  • Complexity assessment
  • Confidence levels

When useful:

  • Understanding results
  • Debugging queries
  • Learning system behavior
  • Transparency

Multi-Modal Responses

Rich content streaming:

  • Text with formatting
  • Inline citations
  • Code blocks
  • Tables and lists
  • Images (if supported)

Protocol Details

Message Format

Standard envelope:

json
{
  "index": 0,
  "timestamp": "2024-01-15T10:00:00Z",
  "type": "message_type",
  "content": "..."
}

Sequential indexing:

  • Starts at 0
  • Increments for each message
  • Helps detect missing messages
  • Enables ordering

Connection Lifecycle

Typical flow:

  1. Client connects to WebSocket
  2. Authentication handshake
  3. Subscribe to response channel
  4. Send query
  5. Receive stream of messages
  6. Completion or cancellation
  7. Connection maintained for next query

Real-time streaming makes Scrapalot feel responsive and interactive. You're always in the loop, seeing exactly what's happening as your query is processed.

Released under the MIT License.