Background Processing

Scrapalot processes documents in the background so you never have to wait. Upload files and continue working - you'll be notified when processing completes.

How Background Processing Works

When you upload a document, Scrapalot immediately queues it for processing and returns control to you. Behind the scenes, specialized workers handle the heavy lifting:

What Gets Processed

Document Processing

When you upload files:

Text Extraction - Content extracted from PDFs, Word docs, etc.
Smart Chunking - Documents split into optimal segments
Embedding Generation - Vector embeddings created for semantic search
Indexing - Chunks stored and indexed for fast retrieval

Processing time:

Small documents (1-10 pages): 10-30 seconds
Medium documents (10-100 pages): 30-120 seconds
Large documents (100+ pages): 2-5 minutes

External Document Fetching

When you connect external sources:

Automatic synchronization on schedule
Downloads from Google Drive, web pages, APIs
Queued processing for each fetched document
Error handling and retry logic

Schedule options:

Manual (on-demand only)
Hourly
Daily
Weekly

Real-Time Progress

Progress Tracking

You always know what's happening:

Upload stage (0-10%)
Validation (10-20%)
Text extraction (20-50%)
Chunking (50-70%)
Embedding generation (70-95%)
Final indexing (95-100%)

Visual feedback:

Progress bars in UI
Status messages
Estimated time remaining
Error notifications if issues occur

Notifications

Get notified when:

Documents finish processing
Processing errors occur
External fetches complete
Batch operations finish

Deployment Options

Minimal Setup (Default)

For small teams and getting started:

Essential workers only
Low memory footprint
Handles typical workloads
Good for 1-10 concurrent users

Requirements:

4GB RAM minimum
Processes one document at a time
Good for most use cases

Enhanced Setup

For larger teams and high volume:

Multiple specialized workers
Parallel document processing
Faster turnaround times
Handles 10+ concurrent users

Requirements:

8GB+ RAM recommended
Processes multiple documents simultaneously
Better for production deployments

Resource Allocation

Workers adapt to your hardware:

Lightweight workers for small servers
Heavy workers for powerful machines
Automatic memory management
Graceful degradation under load

Error Handling

Automatic Retry

If processing fails:

Automatic retry with increasing delays
Up to 3 attempts per document
Clear error messages if all attempts fail
Queue continues with other documents

Common issues handled:

Temporary network failures
Rate limit exceeded (external sources)
Corrupted file detection
Timeout on very large files

Error Recovery

When things go wrong:

You receive clear error notification
Other documents continue processing
Failed document can be re-uploaded
Detailed logs for troubleshooting

Error types:

File format not supported
Document too large
Corrupted file
External service unavailable

Performance Features

Smart Queuing

Priority handling:

User-triggered uploads get priority
Scheduled syncs run during low activity
System balances load automatically
No queue starvation

Resource Management

Efficient processing:

Memory limits prevent server overload
Automatic worker restart on memory buildup
CPU throttling for better responsiveness
Disk space monitoring

Scalability

Grows with your needs:

Add more workers as usage increases
Horizontal scaling supported
Load balancing across workers
No downtime for scaling

Monitoring & Health

System Health

Track processing performance:

Active jobs count
Queue depth
Processing times
Error rates
Resource utilization

Access via:

Admin dashboard
Health check endpoint
System logs

Scheduled Tasks

Automatic maintenance:

Cleanup of temporary files
Old session removal
Job history archival
Health checks

Schedule:

Runs during low activity periods
Configurable timing
Minimal performance impact

Configuration Options

Worker Tuning

Adjust for your environment:

Low Memory: Reduce concurrent processing
High Memory: Increase parallel workers
CPU Limited: Reduce worker concurrency
Fast Storage: Increase batch sizes

Processing Behavior

Customize processing:

Chunk size preferences
Embedding model selection
Retry attempt limits
Timeout durations

Troubleshooting

Slow Processing

If documents take too long:

Check system resource usage
Verify worker health
Review document size/complexity
Consider adding more workers

Typical solutions:

Reduce concurrent uploads
Increase worker memory allocation
Split very large documents
Use faster embedding models

Processing Stuck

If progress stops:

Check worker status
Review error logs
Restart workers if needed
Re-upload problematic documents

Prevention:

Monitor queue depth
Set appropriate timeouts
Regular health checks

High Resource Usage

If system runs hot:

Reduce worker concurrency
Increase restart frequency
Monitor for memory leaks
Review processing limits

Best Practices

Efficient Uploads

Optimize your workflow:

Upload related documents together
Use appropriate file formats
Pre-process very large files
Remove unnecessary pages

Scheduled Syncs

Configure wisely:

Schedule during off-peak hours
Set reasonable sync frequencies
Monitor quota usage (external sources)
Review and clean old syncs

Resource Planning

Plan for growth:

Start with minimal setup
Monitor actual usage patterns
Scale workers as needed
Review performance metrics regularly

External Connectors - Automatic document fetching
Database Design - Where processed data goes
Document Processing - Chunking strategies
Deployment Guide - Production configuration

Background processing is automatic and requires no user intervention. Just upload documents and Scrapalot handles the rest.

Background Processing ​

How Background Processing Works ​

What Gets Processed ​

Document Processing ​

External Document Fetching ​

Real-Time Progress ​

Progress Tracking ​

Notifications ​

Deployment Options ​

Minimal Setup (Default) ​

Enhanced Setup ​

Resource Allocation ​

Error Handling ​

Automatic Retry ​

Error Recovery ​

Performance Features ​

Smart Queuing ​

Resource Management ​

Scalability ​

Monitoring & Health ​

System Health ​

Scheduled Tasks ​

Configuration Options ​

Worker Tuning ​

Processing Behavior ​

Troubleshooting ​

Slow Processing ​

Processing Stuck ​

High Resource Usage ​

Best Practices ​

Efficient Uploads ​

Scheduled Syncs ​

Resource Planning ​

Related Documentation ​