Skip to content

Cloud Infrastructure Deployment

Complete guide for deploying Scrapalot on cloud servers with Docker and CI/CD automation.

Overview

This guide covers deploying Scrapalot on cloud servers with:

  • Docker Compose deployment
  • GitHub Actions CI/CD pipeline
  • Nginx Proxy Manager for SSL and routing
  • External Supabase PostgreSQL (managed database)
  • Optional GPU/Vulkan support

Quick Start

Prerequisites

Server Requirements:

  • Ubuntu 24.04 LTS
  • Minimum: 4 vCPUs, 16GB RAM, 80GB SSD
  • Public IP address
  • Root or sudo access

External Services:

  • Supabase PostgreSQL (managed database)
  • GitHub repositories with Actions enabled
  • Domain name for SSL

API Keys:

  • HuggingFace token
  • Google OAuth credentials
  • API keys for optional services (Firecrawl, SerpAPI)

Step 1: Initialize Server

Duration: 15 minutes

bash
# SSH into your server
ssh root@YOUR_SERVER_IP

# Download management script
wget https://raw.githubusercontent.com/sime2408/scrapalot-chat/main/docker-scrapalot/manage.sh
chmod +x manage.sh

# Run initialization (installs Docker, configures firewall, sets up directories)
./manage.sh init

# Log out and back in for Docker group changes
exit
ssh root@YOUR_SERVER_IP

What the init script does:

  • Installs Docker and Docker Compose
  • Configures UFW firewall (ports 80/443)
  • Sets up swap space
  • Optimizes system parameters
  • Creates deployment directories
  • Configures log rotation

Step 2: Setup GitHub Actions Runners

Duration: 15 minutes

You need two separate runners - one for backend and one for frontend deployments.

Backend Runner

bash
# Create runner directory
mkdir -p /opt/scrapalot/actions-runner && cd /opt/scrapalot/actions-runner

# Download runner
curl -o actions-runner-linux-x64-2.329.0.tar.gz -L \
  https://github.com/actions/runner/releases/download/v2.329.0/actions-runner-linux-x64-2.329.0.tar.gz
tar xzf ./actions-runner-linux-x64-2.329.0.tar.gz

# Get token from: https://github.com/YOUR_ORG/scrapalot-chat/settings/actions/runners/new
./config.sh \
  --url https://github.com/YOUR_ORG/scrapalot-chat \
  --token YOUR_BACKEND_TOKEN \
  --work /opt/scrapalot/_work \
  --labels hetzner,production

# Install and start service
sudo ./svc.sh install github-runner
sudo ./svc.sh start
sudo ./svc.sh status

UI Runner

bash
# Create UI runner directory
mkdir -p /opt/scrapalot/actions-runner-ui && cd /opt/scrapalot/actions-runner-ui

# Download runner
curl -o actions-runner-linux-x64-2.329.0.tar.gz -L \
  https://github.com/actions/runner/releases/download/v2.329.0/actions-runner-linux-x64-2.329.0.tar.gz
tar xzf ./actions-runner-linux-x64-2.329.0.tar.gz

# Get token from: https://github.com/YOUR_ORG/scrapalot-ui/settings/actions/runners/new
./config.sh \
  --url https://github.com/YOUR_ORG/scrapalot-ui \
  --token YOUR_UI_TOKEN \
  --work /opt/scrapalot/_work_ui \
  --labels hetzner,production

# Install and start service
sudo ./svc.sh install github-runner
sudo ./svc.sh start
sudo ./svc.sh status

Verify Runners

bash
# Check both runners
sudo systemctl status actions.runner.*

# View logs if needed
sudo journalctl -u actions.runner.* -f

Step 3: Configure GitHub Secrets

Duration: 5 minutes

Add secrets in your repository: Settings → Secrets and variables → Actions

Required Secrets:

Environment:

  • ENVIRONMENT - Deployment environment (dev, rc, prod)

Database:

  • POSTGRES_USER - PostgreSQL username
  • POSTGRES_PASSWORD - PostgreSQL password
  • POSTGRES_DB - Database name
  • POSTGRES_HOST - Supabase host (e.g., aws-1-eu-central-1.pooler.supabase.com)
  • POSTGRES_PORT - Supabase port (typically 6543)

Redis & Neo4j:

  • REDIS_PASSWORD - Redis password
  • NEO4J_USER - Neo4j username (default: neo4j)
  • NEO4J_PASSWORD - Neo4j password

API Keys:

  • HUGGINGFACE_TOKEN - HuggingFace API token
  • FIRECRAWL_API_KEY - Firecrawl API key (optional)
  • SERPAPI_KEY - SerpAPI key (optional)

OAuth:

URLs:

Workers:

  • ENABLE_BACKGROUND_WORKERS - Enable workers (true/false)
  • USE_LIGHTWEIGHT_WORKERS - Use lightweight workers for memory optimization (true/false)

Step 4: Deploy Infrastructure

Duration: 10 minutes

Deploy base services using GitHub Actions:

  1. Go to: https://github.com/YOUR_ORG/scrapalot-chat/actions
  2. Select: "Deploy Infrastructure Services to cloud"
  3. Click: "Run workflow"

This deploys:

  • Redis (caching)
  • Neo4j (knowledge graph)
  • Portainer (container management)
  • Nginx Proxy Manager (reverse proxy + SSL)

Or manually:

bash
cd /opt/scrapalot/scrapalot-chat/docker-scrapalot

# Deploy infrastructure services
docker compose -f docker-compose.yaml up -d \
  portainer nginx-proxy-manager redis neo4j

Step 5: Deploy Application

Duration: 5 minutes

Deploy backend and workers using GitHub Actions:

  1. Go to: https://github.com/YOUR_ORG/scrapalot-chat/actions
  2. Select: "Deploy Backend to cloud"
  3. Click: "Run workflow"

Or trigger via git push:

bash
# Push to main branch triggers automatic deployment
git push origin main

Step 6: Configure SSL

Duration: 10 minutes

Update DNS Records

Point your domain to server IP address:

TypeNameValueTTL
A@YOUR_SERVER_IP600
AwwwYOUR_SERVER_IP600
AapiYOUR_SERVER_IP600
AlogsYOUR_SERVER_IP600
AroutesYOUR_SERVER_IP600
AgraphYOUR_SERVER_IP600

Verify DNS propagation:

bash
dig yourdomain.com +short
# Should return: YOUR_SERVER_IP

Access Nginx Proxy Manager

  1. Navigate to: http://YOUR_SERVER_IP:81
  2. Login with default credentials:
    • Email: admin@example.com
    • Password: changeme
  3. Change password immediately!

Create Proxy Hosts

Proxy Host 1: Frontend

  • Domain Names: yourdomain.com, www.yourdomain.com
  • Scheme: http
  • Forward Hostname/IP: scrapalot-ui
  • Forward Port: 3000
  • Enable:
    • Block Common Exploits
    • Websockets Support
    • Cache Assets

SSL Tab:

  • Request new SSL Certificate (Let's Encrypt)
  • Enable: Force SSL, HTTP/2 Support, HSTS

Proxy Host 2: Backend API

  • Domain Names: api.yourdomain.com
  • Scheme: http
  • Forward Hostname/IP: scrapalot-chat
  • Forward Port: 8090
  • Enable:
    • Block Common Exploits
    • Websockets Support

SSL Tab:

  • Request new SSL Certificate
  • Enable: Force SSL, HTTP/2 Support, HSTS

Advanced Tab - Paste this configuration:

nginx
# Increase timeouts for long-running requests
proxy_read_timeout 300s;
proxy_connect_timeout 75s;
proxy_send_timeout 300s;

# WebSocket support
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection 'upgrade';
proxy_cache_bypass $http_upgrade;

# Forward real client IP
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;

# CORS headers
add_header Access-Control-Allow-Origin "https://yourdomain.com" always;
add_header Access-Control-Allow-Methods "GET, POST, PUT, DELETE, OPTIONS" always;
add_header Access-Control-Allow-Headers "Authorization, Content-Type" always;
add_header Access-Control-Allow-Credentials "true" always;

# Handle OPTIONS preflight
if ($request_method = 'OPTIONS') {
    return 204;
}

Update Environment Variables

After SSL is configured, update GitHub secrets with HTTPS URLs:

  • FRONTEND_URLhttps://yourdomain.com
  • BACKEND_BASE_URLhttps://api.yourdomain.com
  • VITE_API_BASE_URLhttps://api.yourdomain.com/api/v1
  • GOOGLE_OAUTH_REDIRECT_URIhttps://api.yourdomain.com/api/v1/auth/google/callback

Then redeploy the application.

Architecture

Network Architecture

Service Dependencies

Application Services:

  • scrapalot-chat (Backend)

    • Depends on: Redis, Neo4j
    • Connects to: Supabase PostgreSQL (external)
    • Exposes: 8090 (API), 8091 (LLM Inference)
  • scrapalot-ui (Frontend)

    • Exposes: 3000

Infrastructure Services:

  • portainer - Container management (Port 9000, 9443)
  • nginx-proxy-manager - Reverse proxy + SSL (Port 80, 443, 81)
  • redis - Caching (Port 6379)
  • neo4j - Knowledge graph (Port 7474, 7687)

Background Workers (Optional):

Essential workers (auto-deployed with backend):

  • scrapalot-primary - Coordination & high-priority tasks (1GB RAM)
  • scrapalot-docprocessing - Document processing & embedding (2GB RAM)
  • scrapalot-beat - Celery scheduler (256MB RAM)

Additional workers (manual deployment):

  • scrapalot-docfetching - Document fetching (1GB RAM)
  • scrapalot-light - Fast operations (512MB RAM)
  • scrapalot-heavy - Resource-intensive operations (2GB RAM)

Data Persistence

Local Docker Volumes:

  • redis_data - Redis cache
  • neo4j_data - Neo4j graph database
  • scrapalot_data - Application uploads and cache
  • npm_data - Nginx Proxy Manager config
  • npm_letsencrypt - SSL certificates
  • portainer_data - Portainer config

External (Supabase):

  • PostgreSQL database with automatic backups

Management Interfaces

Access Overview

ServiceURLPortAuthentication
Applicationhttps://yourdomain.com443Google OAuth
Portainerhttps://logs.yourdomain.com443Portainer login
Nginx Proxyhttps://routes.yourdomain.com443NPM login
Neo4jhttps://graph.yourdomain.com443Neo4j login

Portainer (Container Management)

URL: https://logs.yourdomain.com

First-time Setup:

  1. Create admin account (min 12 characters password)
  2. Select Docker environment
  3. Click Connect

Features:

  • View all running containers
  • Real-time logs
  • Resource monitoring (CPU, Memory, Network)
  • Container restart/stop
  • Execute commands in containers
  • Volume and network management

Security:

  • Enable 2FA: Username → My account → Two-factor authentication
  • Create access lists to restrict IP addresses

Nginx Proxy Manager

URL: https://routes.yourdomain.com

Default Credentials:

  • Email: admin@example.com
  • Password: changeme

Change password immediately!

Features:

  • Manage proxy hosts
  • Request SSL certificates (Let's Encrypt)
  • Configure access lists
  • View access logs
  • Custom Nginx configurations

Neo4j Browser

URL: https://graph.yourdomain.com

Connect to:

  • Bolt URL: bolt://neo4j:7687
  • Username: neo4j
  • Password: Your Neo4j password from secrets

Features:

  • Query knowledge graph
  • Visualize entity relationships
  • Database administration

CI/CD Pipeline

GitHub Actions Workflows

Infrastructure Deployment:

Workflow file: .github/workflows/deploy-infrastructure.yml

Deploys:

  • Redis
  • Neo4j
  • Portainer
  • Nginx Proxy Manager

Backend Deployment:

Workflow file: .github/workflows/deploy-backend.yml

Deploys:

  • scrapalot-chat backend
  • Background workers (optional)

UI Deployment:

Workflow file: .github/workflows/deploy-ui.yml

Deploys:

  • scrapalot-ui frontend

Manual Deployment

If you need to deploy manually without GitHub Actions:

bash
# SSH into server
ssh root@YOUR_SERVER_IP

# Navigate to deployment directory
cd /opt/scrapalot/scrapalot-chat/docker-scrapalot

# Pull latest images
docker compose pull

# Deploy services
docker compose up -d

# Check status
docker compose ps

# View logs
docker compose logs -f scrapalot-chat

Common Tasks

View Logs

bash
# All services
docker compose logs -f

# Specific service
docker compose logs -f scrapalot-chat

# Last 100 lines
docker compose logs --tail=100 scrapalot-chat

Restart Services

bash
# Restart all
docker compose restart

# Restart specific service
docker compose restart scrapalot-chat

# Restart with rebuild
docker compose up -d --force-recreate scrapalot-chat

Update Application

bash
# Pull latest code (if manual deployment)
cd /opt/scrapalot/scrapalot-chat
git pull origin main

# Rebuild and restart
cd docker-scrapalot
docker compose build scrapalot-chat
docker compose up -d scrapalot-chat

Database Backup

bash
# Backup Supabase database
pg_dump -h YOUR_SUPABASE_HOST \
  -p 6543 \
  -U YOUR_USERNAME \
  -d YOUR_DATABASE \
  > backup_$(date +%Y%m%d).sql

# Compress backup
gzip backup_$(date +%Y%m%d).sql

Monitor Resource Usage

bash
# Docker stats
docker stats

# System resources
htop

# Disk usage
df -h
du -sh /opt/scrapalot/*

GPU Acceleration

For detailed GPU setup, see the GPU Setup Guide.

Quick Vulkan Setup:

bash
# Install Vulkan drivers (Ubuntu)
sudo apt update && sudo apt install vulkan-tools mesa-vulkan-drivers

# Verify installation
vulkaninfo --summary

# Enable Vulkan in environment
export LLM_VULKAN_ENABLED=true
export LLM_VULKAN_PREFER=true

# Rebuild with Vulkan support
docker build \
  --build-arg CMAKE_ARGS="-DLLAMA_VULKAN=ON" \
  -f Dockerfile \
  -t scrapalot-chat:latest .

Troubleshooting

Service Won't Start

bash
# Check logs
docker compose logs scrapalot-chat

# Check dependencies
docker compose ps

# Verify environment variables
docker compose config

# Restart dependencies
docker compose restart redis neo4j
docker compose up -d scrapalot-chat

SSL Certificate Issues

bash
# Check certificate expiration in Nginx Proxy Manager
# Let's Encrypt certificates auto-renew if NPM is running

# Force certificate renewal (if needed)
# Access NPM → SSL Certificates → Edit → Force Renew

Database Connection Issues

bash
# Test connection to Supabase
psql -h YOUR_SUPABASE_HOST \
  -p 6543 \
  -U YOUR_USERNAME \
  -d YOUR_DATABASE

# Check backend logs for connection errors
docker compose logs scrapalot-chat | grep -i postgres

Out of Memory

bash
# Check memory usage
free -h

# Check Docker container memory
docker stats --no-stream

# Restart services to free memory
docker compose restart

# If using lightweight workers, ensure flag is set:
# USE_LIGHTWEIGHT_WORKERS=true in GitHub secrets

GitHub Runner Issues

bash
# Check runner status
sudo systemctl status actions.runner.*

# Restart runner
sudo systemctl restart actions.runner.*

# View runner logs
sudo journalctl -u actions.runner.* -f

# Reconfigure runner (if token expired)
cd /opt/scrapalot/actions-runner
./config.sh remove
./config.sh --url https://github.com/YOUR_ORG/scrapalot-chat --token NEW_TOKEN
sudo ./svc.sh install
sudo ./svc.sh start

Security Best Practices

Firewall Configuration

bash
# Check current rules
sudo ufw status verbose

# Allow only necessary ports
sudo ufw allow 22/tcp   # SSH
sudo ufw allow 80/tcp   # HTTP
sudo ufw allow 443/tcp  # HTTPS
sudo ufw enable

Access Restrictions

  1. Use Access Lists in Nginx Proxy Manager

    • Restrict management interfaces to your IP
    • Create separate access lists for different services
  2. Enable 2FA

    • Portainer: Enable 2FA for admin accounts
    • GitHub: Enable 2FA for repository access
  3. Secure Secrets

    • Use GitHub encrypted secrets
    • Never commit secrets to repository
    • Rotate secrets periodically

Regular Maintenance

Weekly:

  • Check logs for errors
  • Monitor resource usage
  • Review access logs

Monthly:

  • Update Docker images
  • Review and rotate secrets
  • Backup database
  • Review SSL certificate expiration

Quarterly:

  • System updates: sudo apt update && sudo apt upgrade
  • Security audit
  • Disaster recovery drill

Next Steps


For support or questions, refer to the main Deployment Guide or consult the project documentation.

Released under the MIT License.