Real-Time Streaming
Scrapalot streams AI responses in real-time so you see answers as they're generated, just like ChatGPT.
How Streaming Works
When you ask a question, Scrapalot:
- Immediately shows you status updates
- Streams the answer as it's generated
- Provides citations in real-time
- Notifies when complete
Why streaming matters:
- No waiting for full response
- See progress as it happens
- Cancel if going wrong direction
- Better user experience
What You See
Status Updates
During processing, you see:
- "Analyzing query for optimal strategy..."
- "Selected strategy: Tri-Modal Fusion"
- "Retrieving relevant documents..."
- "Reranking results by relevance..."
- "Generating response..."
Progress indicators:
- Visual progress bars
- Stage indicators
- Time estimates
- Current operation
Answer Streaming
Answer appears word-by-word:
- Immediate feedback
- Smooth, natural flow
- Can start reading before complete
- Cancel if needed
Citations
Sources appear as mentioned:
- Document name
- Page number
- Relevance score
- Link to view source
Completion
When done:
- Final status message
- Total time taken
- Token count (if applicable)
- Option to follow up
Connection Types
WebSocket (Recommended)
For web interface:
- Real-time bidirectional communication
- Low latency
- Automatic reconnection
- Efficient resource usage
Protocol: STOMP over WebSocket
Connection:
ws://localhost:8090/wsServer-Sent Events (SSE)
For API clients:
- Simple one-way streaming
- HTTP-based
- Easy to implement
- Firewall-friendly
Endpoint:
POST /api/v1/chat/streamMessage Types
Status Messages
What they tell you:
- Current processing stage
- Selected strategy
- Progress percentage
- Estimated completion time
Stages:
- Routing (choosing best approach)
- Retrieval (finding documents)
- Processing (enhancing query/results)
- Generation (creating answer)
Content Messages
The actual answer:
- Streamed incrementally
- Markdown formatted
- With inline citations
- Proper formatting preserved
Citation Messages
Source information:
- Document title
- Page/section reference
- Relevance score
- Document ID for retrieval
Error Messages
If something goes wrong:
- Clear error description
- Error code for reference
- Suggested action
- Option to retry
Completion Messages
End of stream:
- Final status
- Total processing time
- Token usage (if applicable)
- Completion reason (success/error/cancelled)
User Experience
Visual Feedback
What users see:
Phase 1: Query Analysis
- "Analyzing query..."
- Strategy selection
- Reasoning display (if enabled)
Phase 2: Document Retrieval
- "Searching documents..."
- Progress indicator
- Number of results found
Phase 3: Answer Generation
- Streaming text appearing
- Citations appearing inline
- Formatted properly
Phase 4: Complete
- Checkmark or completion icon
- Total time
- Option to continue conversation
Cancellation
User can cancel anytime:
- Click stop button
- Stream stops immediately
- Partial answer preserved
- Can start new query
Performance
Streaming Speed
Typical performance:
- Status updates: < 100ms
- First token: 1-3 seconds
- Stream rate: 20-50 tokens/second
- Complete response: 5-30 seconds
Factors affecting speed:
- AI model selected
- Document collection size
- Query complexity
- Network latency
- Server load
Optimization
For best performance:
- Use WebSocket for web clients
- Enable response caching
- Choose faster AI models when appropriate
- Optimize chunk sizes
- Use appropriate RAG strategy
Error Handling
Connection Issues
If connection drops:
- Automatic reconnection attempt
- Resume from last state
- Preserve partial answer
- Notify user of reconnection
User action:
- Usually automatic recovery
- Manual retry if needed
- Check network status
Processing Errors
If query processing fails:
- Clear error message
- Suggested action
- Option to retry
- Previous context preserved
Common errors:
- No documents found
- AI model unavailable
- Rate limit exceeded
- Invalid query format
Timeout Handling
If processing takes too long:
- Automatic timeout (configurable)
- Partial results returned (if available)
- Option to extend timeout
- Clear timeout message
Best Practices
For Users
Optimal experience:
- Stable internet connection
- Modern web browser
- Enable JavaScript
- Allow WebSocket connections
If streaming slow:
- Check network speed
- Try simpler query
- Select faster AI model
- Reduce collection size
For Developers
Implementing streaming:
- Use WebSocket for web apps
- Implement reconnection logic
- Handle partial responses
- Show progress feedback
- Enable cancellation
Error handling:
- Parse error messages
- Implement retry logic
- Preserve user context
- Provide clear feedback
Troubleshooting
No Streaming, Only Full Response
Check:
- WebSocket connection enabled
- Browser supports WebSocket
- Firewall allows WebSocket
- Using correct endpoint
Fix:
- Enable WebSocket in settings
- Update browser
- Check firewall rules
- Verify connection URL
Delayed Streaming
Possible causes:
- Slow AI model
- Large document collection
- Complex query
- Network latency
- Server under load
Solutions:
- Use faster AI model
- Optimize document chunking
- Simplify query
- Check network connection
- Scale server resources
Connection Drops
If frequently disconnecting:
- Check network stability
- Verify WebSocket support
- Review firewall settings
- Check server logs
- Consider load balancer timeout
Mitigation:
- Implement auto-reconnect
- Use connection keep-alive
- Adjust timeout settings
- Monitor connection health
Advanced Features
Progress Tracking
Detailed progress information:
- Current stage (routing, retrieval, generation)
- Percentage complete (when available)
- Estimated time remaining
- Current operation
Reasoning Display
See AI's thought process:
- Strategy selection reasoning
- Query analysis
- Complexity assessment
- Confidence levels
When useful:
- Understanding results
- Debugging queries
- Learning system behavior
- Transparency
Multi-Modal Responses
Rich content streaming:
- Text with formatting
- Inline citations
- Code blocks
- Tables and lists
- Images (if supported)
Protocol Details
Message Format
Standard envelope:
{
"index": 0,
"timestamp": "2024-01-15T10:00:00Z",
"type": "message_type",
"content": "..."
}Sequential indexing:
- Starts at 0
- Increments for each message
- Helps detect missing messages
- Enables ordering
Connection Lifecycle
Typical flow:
- Client connects to WebSocket
- Authentication handshake
- Subscribe to response channel
- Send query
- Receive stream of messages
- Completion or cancellation
- Connection maintained for next query
Related Documentation
- API Reference - API endpoints
- RAG Strategy - How queries are processed
- User Guide - Using the chat interface
- Architecture - System design
Real-time streaming makes Scrapalot feel responsive and interactive. You're always in the loop, seeing exactly what's happening as your query is processed.