Real-Time Streaming

Scrapalot streams AI responses in real-time so you see answers as they're generated, just like ChatGPT.

How Streaming Works

When you ask a question, Scrapalot:

Immediately shows you status updates
Streams the answer as it's generated
Provides citations in real-time
Notifies when complete

Why streaming matters:

No waiting for full response
See progress as it happens
Cancel if going wrong direction
Better user experience

What You See

Status Updates

During processing, you see:

"Analyzing query for optimal strategy..."
"Selected strategy: Tri-Modal Fusion"
"Retrieving relevant documents..."
"Reranking results by relevance..."
"Generating response..."

Progress indicators:

Visual progress bars
Stage indicators
Time estimates
Current operation

Answer Streaming

Answer appears word-by-word:

Immediate feedback
Smooth, natural flow
Can start reading before complete
Cancel if needed

Citations

Sources appear as mentioned:

Document name
Page number
Relevance score
Link to view source

Completion

When done:

Final status message
Total time taken
Token count (if applicable)
Option to follow up

Connection Types

WebSocket (Recommended)

For web interface:

Real-time bidirectional communication
Low latency
Automatic reconnection
Efficient resource usage

Protocol: STOMP over WebSocket

Connection:

ws://localhost:8090/ws

Server-Sent Events (SSE)

For API clients:

Simple one-way streaming
HTTP-based
Easy to implement
Firewall-friendly

Endpoint:

POST /api/v1/chat/stream

Message Types

Status Messages

What they tell you:

Current processing stage
Selected strategy
Progress percentage
Estimated completion time

Stages:

Routing (choosing best approach)
Retrieval (finding documents)
Processing (enhancing query/results)
Generation (creating answer)

Content Messages

The actual answer:

Streamed incrementally
Markdown formatted
With inline citations
Proper formatting preserved

Citation Messages

Source information:

Document title
Page/section reference
Relevance score
Document ID for retrieval

Error Messages

If something goes wrong:

Clear error description
Error code for reference
Suggested action
Option to retry

Completion Messages

End of stream:

Final status
Total processing time
Token usage (if applicable)
Completion reason (success/error/cancelled)

User Experience

Visual Feedback

What users see:

Phase 1: Query Analysis

"Analyzing query..."
Strategy selection
Reasoning display (if enabled)

Phase 2: Document Retrieval

"Searching documents..."
Progress indicator
Number of results found

Phase 3: Answer Generation

Streaming text appearing
Citations appearing inline
Formatted properly

Phase 4: Complete

Checkmark or completion icon
Total time
Option to continue conversation

Cancellation

User can cancel anytime:

Click stop button
Stream stops immediately
Partial answer preserved
Can start new query

Performance

Streaming Speed

Typical performance:

Status updates: < 100ms
First token: 1-3 seconds
Stream rate: 20-50 tokens/second
Complete response: 5-30 seconds

Factors affecting speed:

AI model selected
Document collection size
Query complexity
Network latency
Server load

Optimization

For best performance:

Use WebSocket for web clients
Enable response caching
Choose faster AI models when appropriate
Optimize chunk sizes
Use appropriate RAG strategy

Error Handling

Connection Issues

If connection drops:

Automatic reconnection attempt
Resume from last state
Preserve partial answer
Notify user of reconnection

User action:

Usually automatic recovery
Manual retry if needed
Check network status

Processing Errors

If query processing fails:

Clear error message
Suggested action
Option to retry
Previous context preserved

Common errors:

No documents found
AI model unavailable
Rate limit exceeded
Invalid query format

Timeout Handling

If processing takes too long:

Automatic timeout (configurable)
Partial results returned (if available)
Option to extend timeout
Clear timeout message

Best Practices

For Users

Optimal experience:

Stable internet connection
Modern web browser
Enable JavaScript
Allow WebSocket connections

If streaming slow:

Check network speed
Try simpler query
Select faster AI model
Reduce collection size

For Developers

Implementing streaming:

Use WebSocket for web apps
Implement reconnection logic
Handle partial responses
Show progress feedback
Enable cancellation

Error handling:

Parse error messages
Implement retry logic
Preserve user context
Provide clear feedback

Troubleshooting

No Streaming, Only Full Response

Check:

WebSocket connection enabled
Browser supports WebSocket
Firewall allows WebSocket
Using correct endpoint

Fix:

Enable WebSocket in settings
Update browser
Check firewall rules
Verify connection URL

Delayed Streaming

Possible causes:

Slow AI model
Large document collection
Complex query
Network latency
Server under load

Solutions:

Use faster AI model
Optimize document chunking
Simplify query
Check network connection
Scale server resources

Connection Drops

If frequently disconnecting:

Check network stability
Verify WebSocket support
Review firewall settings
Check server logs
Consider load balancer timeout

Mitigation:

Implement auto-reconnect
Use connection keep-alive
Adjust timeout settings
Monitor connection health

Advanced Features

Progress Tracking

Detailed progress information:

Current stage (routing, retrieval, generation)
Percentage complete (when available)
Estimated time remaining
Current operation

Reasoning Display

See AI's thought process:

Strategy selection reasoning
Query analysis
Complexity assessment
Confidence levels

When useful:

Understanding results
Debugging queries
Learning system behavior
Transparency

Rich content streaming:

Text with formatting
Inline citations
Code blocks
Tables and lists
Images (if supported)

Protocol Details

Message Format

Standard envelope:

json

{
  "index": 0,
  "timestamp": "2024-01-15T10:00:00Z",
  "type": "message_type",
  "content": "..."
}

Sequential indexing:

Starts at 0
Increments for each message
Helps detect missing messages
Enables ordering

Connection Lifecycle

Typical flow:

Client connects to WebSocket
Authentication handshake
Subscribe to response channel
Send query
Receive stream of messages
Completion or cancellation
Connection maintained for next query

API Reference - API endpoints
RAG Strategy - How queries are processed
User Guide - Using the chat interface
Architecture - System design

Real-time streaming makes Scrapalot feel responsive and interactive. You're always in the loop, seeing exactly what's happening as your query is processed.

Real-Time Streaming ​

How Streaming Works ​

What You See ​

Status Updates ​

Answer Streaming ​

Citations ​

Completion ​

Connection Types ​

WebSocket (Recommended) ​

Server-Sent Events (SSE) ​

Message Types ​

Status Messages ​

Content Messages ​

Citation Messages ​

Error Messages ​

Completion Messages ​

User Experience ​

Visual Feedback ​

Cancellation ​

Performance ​

Streaming Speed ​

Optimization ​

Error Handling ​

Connection Issues ​

Processing Errors ​

Timeout Handling ​

Best Practices ​

For Users ​

For Developers ​

Troubleshooting ​

No Streaming, Only Full Response ​

Delayed Streaming ​

Connection Drops ​

Advanced Features ​

Progress Tracking ​

Reasoning Display ​

Multi-Modal Responses ​

Protocol Details ​

Message Format ​

Connection Lifecycle ​

Related Documentation ​