Skip to content

Production Deployment Guide

This comprehensive guide covers deploying Scrapalot in production environments, from single-server deployments to scalable cloud architectures.

Deployment Overview

Deployment Options

Choose the deployment method that best fits your needs:

Docker

Docker Compose

Single server deployment (recommended for most use cases)

Cloud

Cloud Platforms

Cloud deployment examples (experimental)

VPS

VPS Deployment

Traditional server deployment

Edge

Edge Deployment

Local/on-premises deployment

Architecture Components

Docker Compose Deployment

Cloud Deployment Quick Start

For cloud deployment with CI/CD, the complete workflow:

Key Features:

  • External Supabase PostgreSQL (no local DB)
  • GitHub Actions CI/CD
  • Nginx Proxy Manager for SSL
  • Docker Compose with automated deployments
  • Optional GPU/Vulkan support

For detailed cloud deployment instructions, see the Cloud Infrastructure Guide.

Production Docker Compose Setup

Step 1: Environment Configuration

Create a production environment file based on the template:

bash
cp docker-scrapalot/example.env docker-scrapalot/.env
# Edit docker-scrapalot/.env with your production values

Key Variables to configure:

  • Database credentials (POSTGRES_PASSWORD, REDIS_PASSWORD)
  • LLM provider settings and API keys
  • Model directory configuration
  • Neo4j credentials (optional)
  • GPU and Vulkan support settings

See docker-scrapalot/example.env for the complete template.

Step 2: Production Dockerfile

The main configuration is in docker-scrapalot/Dockerfile with the following features:

  • Python FastAPI backend with all dependencies
  • Integrated GPU acceleration support (Vulkan/CUDA)
  • Vulkan support enabled via build arguments
  • Model directory mounting for persistent storage
  • GPU-aware health checks and proper logging

Deployment Commands

Development Deployment

bash
# Navigate to docker directory
cd docker-scrapalot

# 1. Prepare environment
cp example.env .env
# Edit .env with your values

# 2. Build and start services
docker-compose up -d

# 3. Verify deployment
curl -f http://localhost:8090/health

Production Deployment

bash
# Navigate to docker directory
cd docker-scrapalot

# 1. Prepare environment
cp example.env .env
# Edit .env with production values

# 2. Build and start
docker-compose -f docker-compose.yaml up -d

# 3. Check service health
docker-compose ps
docker-compose logs scrapalot-chat

# 4. Verify all services are running
curl -f http://localhost:8090/health
curl -f http://localhost:8091/health  # LLM service

GPU-Accelerated Deployment (Vulkan)

bash
# Navigate to docker directory
cd docker-scrapalot

# 1. Build with Vulkan support
docker build \
  --build-arg CMAKE_ARGS="-DLLAMA_VULKAN=ON" \
  -f Dockerfile \
  -t scrapalot-chat:latest ..

# 2. Enable Vulkan in environment
export LLM_VULKAN_ENABLED=true
export LLM_VULKAN_PREFER=true

# 3. Deploy with docker-compose
docker-compose -f docker-compose.yaml up -d

# 4. Verify GPU acceleration
docker-compose logs scrapalot-chat | grep -i vulkan

Configuration Management

Advanced Configuration System

Scrapalot uses a comprehensive YAML-based configuration system located at configs/config.yaml. This replaces simple environment variables with sophisticated configuration management.

Key Configuration Sections

Server & Infrastructure:

yaml
# Server Configuration
host: "0.0.0.0"
port: 8090
workers: 4
log_level: "info"

# Redis Configuration
redis:
  host: ${REDIS_HOST:-localhost}
  port: ${REDIS_PORT:-6479}
  password: ${REDIS_PASSWORD:-""}
  db: 0

# PostgreSQL Configuration
postgres:
  host: ${POSTGRES_HOST:-localhost}
  port: ${POSTGRES_PORT:-15432}
  db: ${POSTGRES_DB:-scrapalot}
  user: ${POSTGRES_USER:-scrapalot}
  password: ${POSTGRES_PASSWORD:-scrapalot}

LLM & Model Management:

yaml
llm:
  models_directory: ${LLM_MODELS_DIRECTORY:-models}
  max_parallel_chats: ${LLM_MAX_PARALLEL_CHATS:-1}
  max_loaded_models: ${LLM_MAX_LOADED_MODELS:-1}

  # Advanced model configuration
  advanced:
    gpu_layers: ${LLM_GPU_LAYERS:-auto}
    context_size: ${LLM_CONTEXT_SIZE:-32768}
    batch_size: ${LLM_BATCH_SIZE:-1024}
    threads: ${LLM_THREADS:-4}

Document Processing:

yaml
documents:
  max_concurrent_jobs_per_user: 3
  batch_size: 10
  timeout: 300
  max_file_size_mb: 10
  upload_path: ${UPLOAD_PATH:-data/upload}

Model Directory Structure

The system automatically organizes models by type:

models/
├── gguf/                    # LLM models in GGUF format
├── huggingface/             # Non-embedding HuggingFace models
└── embeddings/              # All embedding models
    ├── gguf/               # GGUF embedding models
    └── huggingface/        # HuggingFace embedding models

Automatic Model Type Detection: The system automatically routes downloaded models to the correct directory based on model name patterns.

Environment Variable Integration

The configuration system supports environment variable overrides using the ${VAR_NAME:-default} syntax:

bash
# Override specific settings
export LLM_MODELS_DIRECTORY="/custom/models/path"
export POSTGRES_PASSWORD="secure_password"
export LLM_GPU_LAYERS="50"

Production Configuration Tips

  1. Security: Always override default passwords in production
  2. Performance: Adjust gpu_layers, context_size, and batch_size based on hardware
  3. Storage: Configure models_directory and upload_path for persistent storage
  4. Scaling: Set max_parallel_chats and workers based on expected load

Cloud Platform Deployments

Experimental

The AWS and GCP deployment configurations below are experimental and have not been fully tested in production. They are provided as starting templates for users who wish to deploy on these platforms. For production-ready deployment, see the Cloud Infrastructure Guide which covers tested Docker Compose deployment with CI/CD.

AWS Deployment with ECS

ECS Task Definition

Example ECS task definition for production deployment:

json
{
  "family": "scrapalot-backend",
  "networkMode": "awsvpc",
  "requiresCompatibilities": ["FARGATE"],
  "cpu": "2048",
  "memory": "4096",
  "executionRoleArn": "arn:aws:iam::account:role/ecsTaskExecutionRole",
  "taskRoleArn": "arn:aws:iam::account:role/ecsTaskRole",
  "containerDefinitions": [
    {
      "name": "scrapalot-backend",
      "image": "your-account.dkr.ecr.region.amazonaws.com/scrapalot-backend:latest",
      "portMappings": [
        {
          "containerPort": 8090,
          "protocol": "tcp"
        }
      ],
      "environment": [
        {
          "name": "ENVIRONMENT",
          "value": "prod"
        },
        {
          "name": "DATABASE_URL",
          "value": "postgresql://user:pass@rds-endpoint:5432/scrapalot"
        }
      ],
      "secrets": [
        {
          "name": "SECRET_KEY",
          "valueFrom": "arn:aws:secretsmanager:region:account:secret:scrapalot/secret-key"
        }
      ],
      "logConfiguration": {
        "logDriver": "awslogs",
        "options": {
          "awslogs-group": "/ecs/scrapalot-backend",
          "awslogs-region": "us-west-2",
          "awslogs-stream-prefix": "ecs"
        }
      },
      "healthCheck": {
        "command": ["CMD-SHELL", "curl -f http://localhost:8090/health || exit 1"],
        "interval": 30,
        "timeout": 5,
        "retries": 3
      }
    }
  ]
}

Google Cloud Platform Deployment

Complete Configuration: gcp/ directory

ComponentFileDescription
Cloud Run Servicegcp/cloudrun.yamlServerless backend deployment
Cloud SQLgcp/cloudsql.yamlManaged PostgreSQL database
Deployment Scriptgcp/deploy.shAutomated deployment script
bash
# Quick deployment
chmod +x gcp/deploy.sh && ./gcp/deploy.sh

SSL/TLS Configuration

Automated SSL Setup

Use the automated SSL setup script with Let's Encrypt:

bash
# Quick SSL setup with Let's Encrypt
sudo ./scripts/setup_ssl.sh yourdomain.com api.yourdomain.com

Files referenced:

  • scripts/setup_ssl.sh - Automated SSL configuration
  • Configuration integrates with Nginx Proxy Manager

Monitoring and Observability

Application Metrics

Monitor key application metrics with Prometheus:

MetricTypeDescription
http_requests_totalCounterTotal HTTP requests by method, endpoint, status
http_request_duration_secondsHistogramHTTP request duration
websocket_connections_activeGaugeActive WebSocket connections
document_processing_secondsHistogramDocument processing time

Configuration:

  • monitoring/prometheus.yml - Metrics collection and alerting
  • monitoring/grafana/dashboards/ - Pre-built dashboards
  • monitoring/alert_rules.yml - Production alerting rules
  • monitoring/exporters/ - Database and service exporters

Example Metrics Implementation

python
from prometheus_client import Counter, Histogram, Gauge

# Metrics
REQUEST_COUNT = Counter('http_requests_total', 'Total HTTP requests',
                       ['method', 'endpoint', 'status'])
REQUEST_DURATION = Histogram('http_request_duration_seconds',
                            'HTTP request duration')
ACTIVE_CONNECTIONS = Gauge('websocket_connections_active',
                          'Active WebSocket connections')
DOCUMENT_PROCESSING_TIME = Histogram('document_processing_seconds',
                                    'Document processing time')

Backup and Recovery

Database Backup Strategy

bash
#!/bin/bash
# scripts/backup_database.sh

BACKUP_DIR="/backups/postgres"
DATE=$(date +%Y%m%d_%H%M%S)
BACKUP_FILE="scrapalot_backup_${DATE}.sql"

# Create backup directory
mkdir -p $BACKUP_DIR

# Perform backup
pg_dump -h $POSTGRES_HOST -U $POSTGRES_USER -d $POSTGRES_DB > $BACKUP_DIR/$BACKUP_FILE

# Compress backup
gzip $BACKUP_DIR/$BACKUP_FILE

# Upload to S3 (optional)
aws s3 cp $BACKUP_DIR/${BACKUP_FILE}.gz s3://your-backup-bucket/postgres/

# Clean up old backups (keep last 30 days)
find $BACKUP_DIR -name "*.gz" -mtime +30 -delete

echo "Backup completed: ${BACKUP_FILE}.gz"

Disaster Recovery Plan

Recovery Procedures:

Database Failure:

  1. Stop application services
  2. Restore from latest backup
  3. Run database migrations if needed
  4. Restart services
  5. Verify functionality

Complete System Failure:

  1. Deploy infrastructure from IaC
  2. Restore database from backup
  3. Restore Redis data if available
  4. Deploy application containers
  5. Restore uploaded files from backup
  6. Update DNS if needed
  7. Verify all services

Backup Schedule:

  • Database: Daily at 2 AM UTC
  • Files: Daily at 3 AM UTC
  • Configuration: On every change

Recovery Objectives:

  • Recovery Time Objective (RTO): 4 hours
  • Recovery Point Objective (RPO): 1 hour

Local Model Deployment

Model Deployment Architecture

Scrapalot implements a sophisticated local model deployment system with dual activation pathways designed for different production use cases.

Deployment Flow Overview

Production Deployment Strategies

Container-Based Model Deployment

Recommended for: Production environments with standardized model requirements

dockerfile
# Dockerfile.models - Specialized container for model serving
FROM python:3.12-slim

# Install model dependencies
RUN pip install llama-cpp-python==0.3.8

# Create models directory with proper permissions
RUN mkdir -p /app/data/models/gguf && \
    chmod 755 /app/data/models

# Copy pre-downloaded models
COPY models/ /app/data/models/

# Set model service configuration
ENV LLM_MODELS_DIRECTORY=/app/data/models
ENV LLM_PROVIDER=local

WORKDIR /app
CMD ["python", "-m", "src.main.service.local_models.model_service"]

Hardware Resource Planning

Model SizeCPU RAMGPU VRAMContainer Memory Limit
1-3B8GB4GB12GB
7B16GB8GB24GB
13B32GB16GB48GB
70B+128GB40GB+160GB+

Production Configuration

Optimized Model Service Configuration

yaml
# config/production.yaml
llm:
  provider: "local"
  models_directory: "/app/data/models"
  max_loaded_models: 2

  advanced:
    gpu_layers: 40              # Conservative GPU usage
    context_size: 4096          # Balanced context size
    batch_size: 256             # Optimized for throughput
    threads: 6                  # Leave cores for other processes
    use_mlock: true             # Keep models in memory
    use_mmap: true              # Enable memory mapping

Health Checks for Model Services

yaml
services:
  scrapalot-backend:
    healthcheck:
      test: [
        "CMD", "curl", "-f",
        "http://localhost:8090/llm-inference/system-capabilities"
      ]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 120s  # Allow time for model loading

Key Deployment Endpoints

Model Activation Pathways:

  • POST /llm-inference/models/{model_id}/start-gpu - Direct GPU activation
  • POST /llm-inference/deploy-model - Service-based deployment
  • GET /llm-inference/system-capabilities - Hardware capabilities check
  • GET /llm-inference/deployment-status - Deployment status monitoring

Monitoring and Troubleshooting

Key Metrics to Monitor

  • Model Loading Times: Track initialization performance
  • GPU Memory Utilization: Monitor VRAM usage patterns
  • Inference Latency: Measure response times
  • Model Switching Frequency: Optimize for usage patterns
  • Threading Health: Monitor background thread performance

Common Issues and Solutions

Model Loading Failures:

bash
# Check model file permissions
docker-compose exec scrapalot-backend ls -la /app/data/models/

# Verify model service threading
docker-compose logs scrapalot-backend | grep -i "load_model_thread"

# Check deployment status
curl http://localhost:8090/llm-inference/deployment-status

Memory Issues:

bash
# Monitor container memory usage
docker stats scrapalot-backend

# Check GPU memory
docker-compose exec scrapalot-backend nvidia-smi

# Review model configuration
docker-compose exec scrapalot-backend cat /app/config.yaml

Best Practices Summary

  1. Dual Pathway Understanding: Choose between direct GPU activation and service-based deployment based on use case
  2. Resource Planning: Allocate sufficient memory and GPU resources based on model requirements
  3. Health Monitoring: Implement comprehensive health checks with appropriate timeouts for model loading
  4. Threading Awareness: Monitor background thread performance and resource isolation
  5. Configuration Management: Use config.yaml for standardized model service deployments
  6. Performance Monitoring: Track model loading times, inference latency, and resource utilization

For detailed model management information, see: Model Management Guide

Windows Conda Environment GPU Setup

CUDA 12.1 Installation for Windows

Based on real-world deployment experience, complete guide for setting up GPU acceleration in Windows conda environment.

Prerequisites

  • Windows 10/11 with NVIDIA GPU
  • Conda environment (e.g., scrapalot-chat)
  • Latest NVIDIA drivers installed

Step-by-Step Installation

1. Activate Conda Environment

powershell
conda activate scrapalot-chat

2. Upgrade pip

powershell
# Use full path if needed
C:\python\envs\scrapalot-chat\python.exe -m pip install --upgrade pip

3. Install PyTorch with CUDA 12.1

powershell
# CRITICAL: Use --force-reinstall to override CPU version from requirements.txt
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121 --force-reinstall

4. Install llama-cpp-python with CUDA 12.1

powershell
# Uninstall CPU version first
pip uninstall -y llama-cpp-python

# Install pre-compiled CUDA wheel (NO CMAKE_ARGS needed)
pip install llama-cpp-python==0.3.16 --extra-index-url https://jllllll.github.io/llama-cpp-python-cuBLAS-wheels/cu121

5. Verify Installation

powershell
python -c "import torch; print('PyTorch version:', torch.__version__); print('CUDA available:', torch.cuda.is_available()); print('CUDA version:', torch.version.cuda)"

Common Issues and Solutions

Issue: PyTorch shows 2.5.1+cpu instead of CUDA version

powershell
# Solution: Force reinstall to override requirements.txt CPU version
pip uninstall -y torch torchvision torchaudio
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121 --force-reinstall

Issue: CUDA available: False

  • Verify NVIDIA drivers are installed: Check Device Manager
  • Ensure you used --force-reinstall flag
  • Check if requirements.txt is overriding with CPU versions

Issue: CMAKE_ARGS confusion

  • Pre-compiled wheels (CUDA): NO CMAKE_ARGS needed
  • Building from source (Vulkan): CMAKE_ARGS required
  • Use pre-compiled wheels for faster, easier installation

Installation Paths Comparison

MethodCMAKE_ARGSBuild TimeCompatibilityRecommended
CUDA Pre-compiled WheelNot neededInstantNVIDIA onlyYes
Vulkan Build from SourceRequired10-30 minUniversal GPUFor AMD/Intel
CPU FallbackNot neededInstantAll systemsDevelopment only

Troubleshooting Commands

powershell
# Check installed packages
pip list | findstr torch
pip list | findstr llama

# Verify GPU detection
python -c "import torch; print(f'GPU count: {torch.cuda.device_count()}'); print(f'GPU name: {torch.cuda.get_device_name(0) if torch.cuda.is_available() else \"N/A\"}')"

# Test llama-cpp-python
python -c "import llama_cpp; print('llama-cpp-python imported successfully')"

Performance Verification

After successful installation, verify GPU acceleration:

powershell
# Start scrapalot-chat and check logs for:
# - "CUDA Available: True"
# - GPU detection messages
# - Model loading with GPU layers

# Monitor GPU usage during inference
# Use Task Manager > Performance > GPU or nvidia-smi if available

Next Steps


This comprehensive deployment guide provides the foundation for deploying Scrapalot in production environments, from simple single-server deployments to complex, scalable cloud architectures.

Released under the MIT License.