Cloud Infrastructure Deployment
Complete guide for deploying Scrapalot on cloud servers with Docker and CI/CD automation.
Overview
This guide covers deploying Scrapalot on cloud servers with:
- Docker Compose deployment
- GitHub Actions CI/CD pipeline
- Nginx Proxy Manager for SSL and routing
- External Supabase PostgreSQL (managed database)
- Optional GPU/Vulkan support
Quick Start
Prerequisites
Server Requirements:
- Ubuntu 24.04 LTS
- Minimum: 4 vCPUs, 16GB RAM, 80GB SSD
- Public IP address
- Root or sudo access
External Services:
- Supabase PostgreSQL (managed database)
- GitHub repositories with Actions enabled
- Domain name for SSL
API Keys:
- HuggingFace token
- Google OAuth credentials
- API keys for optional services (Firecrawl, SerpAPI)
Step 1: Initialize Server
Duration: 15 minutes
# SSH into your server
ssh root@YOUR_SERVER_IP
# Download management script
wget https://raw.githubusercontent.com/sime2408/scrapalot-chat/main/docker-scrapalot/manage.sh
chmod +x manage.sh
# Run initialization (installs Docker, configures firewall, sets up directories)
./manage.sh init
# Log out and back in for Docker group changes
exit
ssh root@YOUR_SERVER_IPWhat the init script does:
- Installs Docker and Docker Compose
- Configures UFW firewall (ports 80/443)
- Sets up swap space
- Optimizes system parameters
- Creates deployment directories
- Configures log rotation
Step 2: Setup GitHub Actions Runners
Duration: 15 minutes
You need two separate runners - one for backend and one for frontend deployments.
Backend Runner
# Create runner directory
mkdir -p /opt/scrapalot/actions-runner && cd /opt/scrapalot/actions-runner
# Download runner
curl -o actions-runner-linux-x64-2.329.0.tar.gz -L \
https://github.com/actions/runner/releases/download/v2.329.0/actions-runner-linux-x64-2.329.0.tar.gz
tar xzf ./actions-runner-linux-x64-2.329.0.tar.gz
# Get token from: https://github.com/YOUR_ORG/scrapalot-chat/settings/actions/runners/new
./config.sh \
--url https://github.com/YOUR_ORG/scrapalot-chat \
--token YOUR_BACKEND_TOKEN \
--work /opt/scrapalot/_work \
--labels hetzner,production
# Install and start service
sudo ./svc.sh install github-runner
sudo ./svc.sh start
sudo ./svc.sh statusUI Runner
# Create UI runner directory
mkdir -p /opt/scrapalot/actions-runner-ui && cd /opt/scrapalot/actions-runner-ui
# Download runner
curl -o actions-runner-linux-x64-2.329.0.tar.gz -L \
https://github.com/actions/runner/releases/download/v2.329.0/actions-runner-linux-x64-2.329.0.tar.gz
tar xzf ./actions-runner-linux-x64-2.329.0.tar.gz
# Get token from: https://github.com/YOUR_ORG/scrapalot-ui/settings/actions/runners/new
./config.sh \
--url https://github.com/YOUR_ORG/scrapalot-ui \
--token YOUR_UI_TOKEN \
--work /opt/scrapalot/_work_ui \
--labels hetzner,production
# Install and start service
sudo ./svc.sh install github-runner
sudo ./svc.sh start
sudo ./svc.sh statusVerify Runners
# Check both runners
sudo systemctl status actions.runner.*
# View logs if needed
sudo journalctl -u actions.runner.* -fStep 3: Configure GitHub Secrets
Duration: 5 minutes
Add secrets in your repository: Settings → Secrets and variables → Actions
Required Secrets:
Environment:
ENVIRONMENT- Deployment environment (dev, rc, prod)
Database:
POSTGRES_USER- PostgreSQL usernamePOSTGRES_PASSWORD- PostgreSQL passwordPOSTGRES_DB- Database namePOSTGRES_HOST- Supabase host (e.g., aws-1-eu-central-1.pooler.supabase.com)POSTGRES_PORT- Supabase port (typically 6543)
Redis & Neo4j:
REDIS_PASSWORD- Redis passwordNEO4J_USER- Neo4j username (default: neo4j)NEO4J_PASSWORD- Neo4j password
API Keys:
HUGGINGFACE_TOKEN- HuggingFace API tokenFIRECRAWL_API_KEY- Firecrawl API key (optional)SERPAPI_KEY- SerpAPI key (optional)
OAuth:
GOOGLE_OAUTH_CLIENT_ID- Google OAuth client IDGOOGLE_OAUTH_CLIENT_SECRET- Google OAuth client secretGOOGLE_OAUTH_REDIRECT_URI- OAuth redirect URI (e.g., https://api.yourdomain.com/api/v1/auth/google/callback)
URLs:
FRONTEND_URL- Frontend URL (e.g., https://yourdomain.com)BACKEND_BASE_URL- Backend URL (e.g., https://api.yourdomain.com)VITE_API_BASE_URL- API base URL for UI (e.g., https://api.yourdomain.com/api/v1)VITE_LLM_INFERENCE_ENDPOINT- LLM endpoint (e.g., https://api.yourdomain.com/api/v1/llm-inference)
Workers:
ENABLE_BACKGROUND_WORKERS- Enable workers (true/false)USE_LIGHTWEIGHT_WORKERS- Use lightweight workers for memory optimization (true/false)
Step 4: Deploy Infrastructure
Duration: 10 minutes
Deploy base services using GitHub Actions:
- Go to:
https://github.com/YOUR_ORG/scrapalot-chat/actions - Select: "Deploy Infrastructure Services to cloud"
- Click: "Run workflow"
This deploys:
- Redis (caching)
- Neo4j (knowledge graph)
- Portainer (container management)
- Nginx Proxy Manager (reverse proxy + SSL)
Or manually:
cd /opt/scrapalot/scrapalot-chat/docker-scrapalot
# Deploy infrastructure services
docker compose -f docker-compose.yaml up -d \
portainer nginx-proxy-manager redis neo4jStep 5: Deploy Application
Duration: 5 minutes
Deploy backend and workers using GitHub Actions:
- Go to:
https://github.com/YOUR_ORG/scrapalot-chat/actions - Select: "Deploy Backend to cloud"
- Click: "Run workflow"
Or trigger via git push:
# Push to main branch triggers automatic deployment
git push origin mainStep 6: Configure SSL
Duration: 10 minutes
Update DNS Records
Point your domain to server IP address:
| Type | Name | Value | TTL |
|---|---|---|---|
| A | @ | YOUR_SERVER_IP | 600 |
| A | www | YOUR_SERVER_IP | 600 |
| A | api | YOUR_SERVER_IP | 600 |
| A | logs | YOUR_SERVER_IP | 600 |
| A | routes | YOUR_SERVER_IP | 600 |
| A | graph | YOUR_SERVER_IP | 600 |
Verify DNS propagation:
dig yourdomain.com +short
# Should return: YOUR_SERVER_IPAccess Nginx Proxy Manager
- Navigate to:
http://YOUR_SERVER_IP:81 - Login with default credentials:
- Email:
admin@example.com - Password:
changeme
- Email:
- Change password immediately!
Create Proxy Hosts
Proxy Host 1: Frontend
- Domain Names:
yourdomain.com,www.yourdomain.com - Scheme: http
- Forward Hostname/IP:
scrapalot-ui - Forward Port:
3000 - Enable:
- Block Common Exploits
- Websockets Support
- Cache Assets
SSL Tab:
- Request new SSL Certificate (Let's Encrypt)
- Enable: Force SSL, HTTP/2 Support, HSTS
Proxy Host 2: Backend API
- Domain Names:
api.yourdomain.com - Scheme: http
- Forward Hostname/IP:
scrapalot-chat - Forward Port:
8090 - Enable:
- Block Common Exploits
- Websockets Support
SSL Tab:
- Request new SSL Certificate
- Enable: Force SSL, HTTP/2 Support, HSTS
Advanced Tab - Paste this configuration:
# Increase timeouts for long-running requests
proxy_read_timeout 300s;
proxy_connect_timeout 75s;
proxy_send_timeout 300s;
# WebSocket support
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection 'upgrade';
proxy_cache_bypass $http_upgrade;
# Forward real client IP
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
# CORS headers
add_header Access-Control-Allow-Origin "https://yourdomain.com" always;
add_header Access-Control-Allow-Methods "GET, POST, PUT, DELETE, OPTIONS" always;
add_header Access-Control-Allow-Headers "Authorization, Content-Type" always;
add_header Access-Control-Allow-Credentials "true" always;
# Handle OPTIONS preflight
if ($request_method = 'OPTIONS') {
return 204;
}Update Environment Variables
After SSL is configured, update GitHub secrets with HTTPS URLs:
FRONTEND_URL→https://yourdomain.comBACKEND_BASE_URL→https://api.yourdomain.comVITE_API_BASE_URL→https://api.yourdomain.com/api/v1GOOGLE_OAUTH_REDIRECT_URI→https://api.yourdomain.com/api/v1/auth/google/callback
Then redeploy the application.
Architecture
Network Architecture
Service Dependencies
Application Services:
scrapalot-chat (Backend)
- Depends on: Redis, Neo4j
- Connects to: Supabase PostgreSQL (external)
- Exposes: 8090 (API), 8091 (LLM Inference)
scrapalot-ui (Frontend)
- Exposes: 3000
Infrastructure Services:
- portainer - Container management (Port 9000, 9443)
- nginx-proxy-manager - Reverse proxy + SSL (Port 80, 443, 81)
- redis - Caching (Port 6379)
- neo4j - Knowledge graph (Port 7474, 7687)
Background Workers (Optional):
Essential workers (auto-deployed with backend):
- scrapalot-primary - Coordination & high-priority tasks (1GB RAM)
- scrapalot-docprocessing - Document processing & embedding (2GB RAM)
- scrapalot-beat - Celery scheduler (256MB RAM)
Additional workers (manual deployment):
- scrapalot-docfetching - Document fetching (1GB RAM)
- scrapalot-light - Fast operations (512MB RAM)
- scrapalot-heavy - Resource-intensive operations (2GB RAM)
Data Persistence
Local Docker Volumes:
redis_data- Redis cacheneo4j_data- Neo4j graph databasescrapalot_data- Application uploads and cachenpm_data- Nginx Proxy Manager confignpm_letsencrypt- SSL certificatesportainer_data- Portainer config
External (Supabase):
- PostgreSQL database with automatic backups
Management Interfaces
Access Overview
| Service | URL | Port | Authentication |
|---|---|---|---|
| Application | https://yourdomain.com | 443 | Google OAuth |
| Portainer | https://logs.yourdomain.com | 443 | Portainer login |
| Nginx Proxy | https://routes.yourdomain.com | 443 | NPM login |
| Neo4j | https://graph.yourdomain.com | 443 | Neo4j login |
Portainer (Container Management)
URL: https://logs.yourdomain.com
First-time Setup:
- Create admin account (min 12 characters password)
- Select Docker environment
- Click Connect
Features:
- View all running containers
- Real-time logs
- Resource monitoring (CPU, Memory, Network)
- Container restart/stop
- Execute commands in containers
- Volume and network management
Security:
- Enable 2FA: Username → My account → Two-factor authentication
- Create access lists to restrict IP addresses
Nginx Proxy Manager
URL: https://routes.yourdomain.com
Default Credentials:
- Email:
admin@example.com - Password:
changeme
Change password immediately!
Features:
- Manage proxy hosts
- Request SSL certificates (Let's Encrypt)
- Configure access lists
- View access logs
- Custom Nginx configurations
Neo4j Browser
URL: https://graph.yourdomain.com
Connect to:
- Bolt URL:
bolt://neo4j:7687 - Username:
neo4j - Password: Your Neo4j password from secrets
Features:
- Query knowledge graph
- Visualize entity relationships
- Database administration
CI/CD Pipeline
GitHub Actions Workflows
Infrastructure Deployment:
Workflow file: .github/workflows/deploy-infrastructure.yml
Deploys:
- Redis
- Neo4j
- Portainer
- Nginx Proxy Manager
Backend Deployment:
Workflow file: .github/workflows/deploy-backend.yml
Deploys:
- scrapalot-chat backend
- Background workers (optional)
UI Deployment:
Workflow file: .github/workflows/deploy-ui.yml
Deploys:
- scrapalot-ui frontend
Manual Deployment
If you need to deploy manually without GitHub Actions:
# SSH into server
ssh root@YOUR_SERVER_IP
# Navigate to deployment directory
cd /opt/scrapalot/scrapalot-chat/docker-scrapalot
# Pull latest images
docker compose pull
# Deploy services
docker compose up -d
# Check status
docker compose ps
# View logs
docker compose logs -f scrapalot-chatCommon Tasks
View Logs
# All services
docker compose logs -f
# Specific service
docker compose logs -f scrapalot-chat
# Last 100 lines
docker compose logs --tail=100 scrapalot-chatRestart Services
# Restart all
docker compose restart
# Restart specific service
docker compose restart scrapalot-chat
# Restart with rebuild
docker compose up -d --force-recreate scrapalot-chatUpdate Application
# Pull latest code (if manual deployment)
cd /opt/scrapalot/scrapalot-chat
git pull origin main
# Rebuild and restart
cd docker-scrapalot
docker compose build scrapalot-chat
docker compose up -d scrapalot-chatDatabase Backup
# Backup Supabase database
pg_dump -h YOUR_SUPABASE_HOST \
-p 6543 \
-U YOUR_USERNAME \
-d YOUR_DATABASE \
> backup_$(date +%Y%m%d).sql
# Compress backup
gzip backup_$(date +%Y%m%d).sqlMonitor Resource Usage
# Docker stats
docker stats
# System resources
htop
# Disk usage
df -h
du -sh /opt/scrapalot/*GPU Acceleration
For detailed GPU setup, see the GPU Setup Guide.
Quick Vulkan Setup:
# Install Vulkan drivers (Ubuntu)
sudo apt update && sudo apt install vulkan-tools mesa-vulkan-drivers
# Verify installation
vulkaninfo --summary
# Enable Vulkan in environment
export LLM_VULKAN_ENABLED=true
export LLM_VULKAN_PREFER=true
# Rebuild with Vulkan support
docker build \
--build-arg CMAKE_ARGS="-DLLAMA_VULKAN=ON" \
-f Dockerfile \
-t scrapalot-chat:latest .Troubleshooting
Service Won't Start
# Check logs
docker compose logs scrapalot-chat
# Check dependencies
docker compose ps
# Verify environment variables
docker compose config
# Restart dependencies
docker compose restart redis neo4j
docker compose up -d scrapalot-chatSSL Certificate Issues
# Check certificate expiration in Nginx Proxy Manager
# Let's Encrypt certificates auto-renew if NPM is running
# Force certificate renewal (if needed)
# Access NPM → SSL Certificates → Edit → Force RenewDatabase Connection Issues
# Test connection to Supabase
psql -h YOUR_SUPABASE_HOST \
-p 6543 \
-U YOUR_USERNAME \
-d YOUR_DATABASE
# Check backend logs for connection errors
docker compose logs scrapalot-chat | grep -i postgresOut of Memory
# Check memory usage
free -h
# Check Docker container memory
docker stats --no-stream
# Restart services to free memory
docker compose restart
# If using lightweight workers, ensure flag is set:
# USE_LIGHTWEIGHT_WORKERS=true in GitHub secretsGitHub Runner Issues
# Check runner status
sudo systemctl status actions.runner.*
# Restart runner
sudo systemctl restart actions.runner.*
# View runner logs
sudo journalctl -u actions.runner.* -f
# Reconfigure runner (if token expired)
cd /opt/scrapalot/actions-runner
./config.sh remove
./config.sh --url https://github.com/YOUR_ORG/scrapalot-chat --token NEW_TOKEN
sudo ./svc.sh install
sudo ./svc.sh startSecurity Best Practices
Firewall Configuration
# Check current rules
sudo ufw status verbose
# Allow only necessary ports
sudo ufw allow 22/tcp # SSH
sudo ufw allow 80/tcp # HTTP
sudo ufw allow 443/tcp # HTTPS
sudo ufw enableAccess Restrictions
Use Access Lists in Nginx Proxy Manager
- Restrict management interfaces to your IP
- Create separate access lists for different services
Enable 2FA
- Portainer: Enable 2FA for admin accounts
- GitHub: Enable 2FA for repository access
Secure Secrets
- Use GitHub encrypted secrets
- Never commit secrets to repository
- Rotate secrets periodically
Regular Maintenance
Weekly:
- Check logs for errors
- Monitor resource usage
- Review access logs
Monthly:
- Update Docker images
- Review and rotate secrets
- Backup database
- Review SSL certificate expiration
Quarterly:
- System updates:
sudo apt update && sudo apt upgrade - Security audit
- Disaster recovery drill
Next Steps
- GPU Setup Guide - Configure GPU acceleration
- Model Management - Deploy local models
- Architecture Overview - Understand system design
- Monitoring Setup - Set up monitoring stack
For support or questions, refer to the main Deployment Guide or consult the project documentation.