External Connectors

Automatically fetch and sync documents from external sources. Keep your knowledge base up-to-date without manual uploads.

What Are Connectors?

Connectors integrate Scrapalot with external services to automatically:

Fetch documents from cloud storage and web sources
Sync on schedule to keep content current
Handle authentication securely
Monitor for updates and fetch new content
Respect rate limits to avoid service issues

Supported Sources

Google Drive

Automatically sync folders from Google Drive

Use cases:

Team documentation stored in shared folders
Project files that update regularly
Policies and procedures that change

Features:

Sync entire folders with subfolders
Filter by file type (PDF, Word, etc.)
Automatic updates when files change
OAuth 2.0 secure authentication

Setup:

Add Google Drive connector to collection
Authorize with your Google account
Select folder to sync
Choose sync schedule
Documents appear automatically

Firecrawl (Web Scraping)

Extract content from websites

Use cases:

Documentation sites
Knowledge bases
Help centers
Blog content

Features:

Handles JavaScript-heavy sites
Waits for dynamic content to load
Extracts clean text
Follows links to specified depth

Setup:

Get Firecrawl API key (free tier available)
Add Firecrawl connector
Enter website URL
Configure crawl depth
Start fetching

Web Scraper (Simple Pages)

Fetch content from static web pages

Use cases:

Simple documentation pages
Static content sites
Public knowledge bases

Features:

Fast, lightweight
No external API needed
Custom CSS selectors
Rate limiting built-in

Setup:

Add Web Scraper connector
Enter page URLs
Optionally specify CSS selectors
Configure delays between requests
Fetch content

Custom API

Connect to any REST API

Use cases:

Internal company systems
Custom document repositories
Third-party services
Legacy systems

Features:

Flexible endpoint configuration
Custom headers and authentication
Response parsing options
Error handling

Setup:

Add API connector
Configure endpoint URL
Set authentication headers
Define response format
Test and activate

Sync Scheduling

Schedule Options

Manual:

Fetch only when you trigger it
Good for one-time imports
Full control over timing

Hourly:

Keep content very current
Good for rapidly changing content
Higher API usage

Daily:

Balance between freshness and efficiency
Recommended for most use cases
Runs during low-activity hours

Weekly:

Light API usage
Good for stable content
Minimal resource impact

Automatic Updates

What happens during sync:

Connector checks source for new/updated documents
Downloads only changed files
Queues documents for processing
Updates existing documents if modified
Sends notification when complete

Smart syncing:

Only fetches what changed
Deduplicates identical content
Preserves existing document metadata
Maintains citation links

Authentication & Security

OAuth 2.0 (Google Drive)

Secure, standard authentication:

Authorize once, works indefinitely
Revoke access anytime
No password storage
Automatic token refresh

Permission scope:

Read-only access to selected folders
Cannot modify your files
Limited to folders you choose

API Keys (Firecrawl, Custom APIs)

Simple key-based authentication:

Store keys securely encrypted
Never exposed in logs
Easy to rotate
Revoke anytime

Security:

Keys encrypted at rest
Transmitted over TLS
Access controlled per user

Error Handling

Automatic Retry

If fetching fails:

Automatic retry with exponential backoff
Skip problematic documents, continue with others
Detailed error logging
User notification of issues

Common failures handled:

Temporary network issues
Rate limit exceeded (waits and retries)
Document temporarily unavailable
Authentication token expired (auto-refresh)

Notifications

You're informed when:

Sync completes successfully
Documents fail to fetch
Authentication expires
Rate limits approached
Service unavailable

Monitoring & Management

Connector Status

Track connector health:

Last successful sync time
Next scheduled sync
Documents fetched
Success/failure counts
Current status (active, paused, error)

Available actions:

Trigger manual sync
Pause/resume syncing
Edit configuration
View sync history
Delete connector

Sync History

View past activity:

Sync timestamps
Success/failure status
Documents processed
Error messages
Processing time

Use for:

Troubleshooting issues
Verifying sync schedule
Monitoring API usage
Audit trail

Rate Limiting & Quotas

Automatic Rate Management

Respects API limits:

Configurable delays between requests
Automatic backoff on limit warnings
Queue management to spread load
Pause and resume on quota exhaustion

Google Drive:

1000 requests per 100 seconds (Google limit)
Automatic throttling built-in
Batch operations when possible

Firecrawl:

Free tier: 500 pages/month
Paid tier: Higher limits
Tracks usage automatically

Quota Monitoring

Track API usage:

Current usage vs. limits
Usage by connector
Alerts when approaching limits
Recommendations to optimize

Best Practices

Connector Setup

Optimize your connectors:

Use specific folders/URLs, not entire drives
Filter by relevant file types
Set appropriate sync frequency
Group related content in same connector

Performance

Efficient syncing:

Schedule during low-usage hours
Avoid hourly sync unless necessary
Use manual sync for one-time imports
Monitor document count growth

Organization

Keep it maintainable:

Name connectors descriptively
Document what each connector fetches
Review and clean unused connectors
Archive completed syncs

Security

Protect your data:

Use minimum necessary permissions
Review connector access regularly
Rotate API keys periodically
Remove unused connectors

Troubleshooting

Connector Won't Authenticate

Check:

Credentials are correct
OAuth consent not expired
API key is valid
Service is accessible

Solutions:

Re-authorize OAuth
Generate new API key
Check firewall/network
Verify service status

No Documents Fetched

Common causes:

Empty folder/source
File type filters too restrictive
Permission issues
Rate limit reached

Solutions:

Verify source has content
Adjust file type filters
Check permissions
Review quota usage

Sync Failing Repeatedly

Investigate:

Error messages in history
Service health status
Authentication validity
Network connectivity

Fix:

Address specific error
Re-authenticate if needed
Check source availability
Contact support if persistent

Use Case Examples

Team Documentation

Scenario: Engineering team stores docs in Google Drive

Setup:

Connect to Drive folder
Daily sync schedule
PDF and Markdown files only
Notify on updates

Benefits:

Always current documentation
No manual uploads
Automatic processing
Team stays informed

Product Knowledge Base

Scenario: Public help center needs to be searchable

Setup:

Firecrawl connector to help site
Weekly sync
2-level deep crawl
Main content section only

Benefits:

Searchable help content
Updated automatically
Full-text search
Citation to original

Compliance Documents

Scenario: Regulatory documents from internal API

Setup:

Custom API connector
Monthly sync
Authenticated endpoint
Document metadata preserved

Benefits:

Centralized compliance search
Automatic updates
Audit trail maintained
Secure access

Background Workers - How fetched documents are processed
Document Processing - Content chunking
Database Design - Storage of synced documents
Deployment Guide - Production connector setup

Connectors automate document management so you never have to manually upload updates. Set it up once and forget it.

External Connectors ​

What Are Connectors? ​

Supported Sources ​

Google Drive ​

Firecrawl (Web Scraping) ​

Web Scraper (Simple Pages) ​

Custom API ​

Sync Scheduling ​

Schedule Options ​

Automatic Updates ​

Authentication & Security ​

OAuth 2.0 (Google Drive) ​

API Keys (Firecrawl, Custom APIs) ​

Error Handling ​

Automatic Retry ​

Notifications ​

Monitoring & Management ​

Connector Status ​

Sync History ​

Rate Limiting & Quotas ​

Automatic Rate Management ​

Quota Monitoring ​

Best Practices ​

Connector Setup ​

Performance ​

Organization ​

Security ​

Troubleshooting ​

Connector Won't Authenticate ​

No Documents Fetched ​

Sync Failing Repeatedly ​

Use Case Examples ​

Team Documentation ​

Product Knowledge Base ​

Compliance Documents ​

Related Documentation ​

External Connectors

What Are Connectors?

Supported Sources

Google Drive

Firecrawl (Web Scraping)

Web Scraper (Simple Pages)

Custom API

Sync Scheduling

Schedule Options

Automatic Updates

Authentication & Security

OAuth 2.0 (Google Drive)

API Keys (Firecrawl, Custom APIs)

Error Handling

Automatic Retry

Notifications

Monitoring & Management

Connector Status

Sync History

Rate Limiting & Quotas

Automatic Rate Management

Quota Monitoring

Best Practices

Connector Setup

Performance

Organization

Security

Troubleshooting

Connector Won't Authenticate

No Documents Fetched

Sync Failing Repeatedly

Use Case Examples

Team Documentation

Product Knowledge Base

Compliance Documents

Related Documentation