Integrations

Connect external data sources to automatically sync documents into Scrapalot.

Cloud Storage Connectors

Google Drive

Status: ✅ Available

Connect your Google Drive to automatically sync documents.

Features:

OAuth 2.0 authentication
Folder-specific sync
Shared drive support
Automatic file type filtering
Token refresh handling

Supported File Types:

Google Docs, Sheets, Slides (exported as PDF)
PDF, text files, and more

Setup:

Go to Settings → Workspace Connectors
Click "Connect Google Drive"
Authorize Scrapalot to access your Drive
Select folders to sync
Configure sync schedule (manual, hourly, daily, weekly)

Dropbox

Status: ✅ Available

Sync files from your Dropbox folders.

Features:

OAuth 2.0 authentication
Folder path selection
File type filtering
Scheduled synchronization

Setup:

Go to Settings → Workspace Connectors
Click "Connect Dropbox"
Authorize access to Dropbox
Configure folder path and sync settings

Notion

Status: ✅ Available

Import pages and databases from Notion workspaces.

Features:

OAuth 2.0 authentication
Page content extraction
Database content extraction
Hierarchical page structure
Rich text formatting support

Setup:

Go to Settings → Workspace Connectors
Click "Connect Notion"
Authorize Scrapalot integration
Select pages/databases to sync

Academic Connectors

Google Scholar

Status: ✅ Available

Search and import academic papers from Google Scholar.

Features:

Keyword and author search
Paper metadata extraction
Citation counts
No API key required (optional proxy for heavy usage)

Setup:

Go to Settings → Workspace Connectors
Click "Connect Google Scholar"
Enter search query (e.g., "machine learning" or "author:John Doe")
Configure max results
Start sync

Note: Uses web scraping. May trigger CAPTCHA without proxy for heavy usage.

Semantic Scholar

Status: ✅ Available

AI-powered academic paper search via official Semantic Scholar API.

Features:

Official REST API
Optional API key for higher rate limits
Full paper metadata, abstracts, citations
PDF URLs when available
References and citations tracking

Rate Limits:

Without API key: 100 requests per 5 minutes
With API key: 5000 requests per 5 minutes

Setup:

(Optional) Get API key from https://www.semanticscholar.org/product/api
Go to Settings → Workspace Connectors
Click "Connect Semantic Scholar"
Enter API key (optional) and search query
Configure max results

arXiv

Status: ✅ Available

Access preprint papers from arXiv repository.

Features:

Official arXiv API (completely free)
No authentication required
Category filtering (cs.AI, math.CO, etc.)
Automatic PDF download
Full metadata extraction

Setup:

Go to Settings → Workspace Connectors
Click "Connect arXiv"
Enter search query and optional category filter
Configure max results
Start sync

Categories: See https://arxiv.org/category_taxonomy for full list

Web Scraping

Firecrawl

Status: ✅ Available

Web scraping with JavaScript rendering for dynamic content.

Features:

JavaScript rendering
Dynamic content extraction
Configurable wait conditions
Max depth control

Setup:

Sign up at firecrawl.dev and get API key
Add FIRECRAWL_API_KEY to environment variables
Configure connector with target URL and scraping options

Web Scraper

Status: ✅ Available

Simple web page fetching and content extraction.

Features:

CSS selector-based extraction
Multiple URL support
Configurable rate limiting

Setup: Configure connector with URLs and CSS selectors for content extraction.

Potential Future Integrations

The connector system is extensible. If you need integration with a specific platform, you can:

Build a custom connector - See Creating Custom Connectors
Request a feature - Submit an issue on GitHub
Contribute - Pull requests welcome for new connectors

Community Interest (Not Committed):

GitHub/GitLab repository integration
Slack message archival
Additional cloud storage providers
Enterprise knowledge bases

Creating Custom Connectors

Scrapalot supports custom connector development. See the External Connectors architecture documentation for implementation details.

Key Steps:

Inherit from BaseConnector class
Implement required interfaces (LoadConnector, PollConnector, etc.)
Register connector with @register_connector decorator
Add configuration schema and OAuth flow (if needed)

For detailed instructions, see scrapalot-chat/docs/README_EXTERNAL_CONNECTORS.md

Troubleshooting

OAuth Authorization Failed

Solution: Check that redirect URIs are configured correctly in the provider's OAuth app settings.

Sync Not Working

Solution:

Verify credentials are still valid
Check connector status in Settings
Review sync job logs for errors
Ensure background workers are running

Rate Limit Errors

Solution:

Google Scholar: Configure proxy for heavy usage
Semantic Scholar: Add API key to increase limits
arXiv: Respect 3-second delay between requests (automatic)

Learn More

For detailed connector implementation and architecture, see:

Integrations ​

Cloud Storage Connectors ​

Google Drive ​

Dropbox ​

Notion ​

Academic Connectors ​

Google Scholar ​

Semantic Scholar ​

arXiv ​

Web Scraping ​

Firecrawl ​

Web Scraper ​

Potential Future Integrations ​

Creating Custom Connectors ​

Troubleshooting ​

OAuth Authorization Failed ​

Sync Not Working ​

Rate Limit Errors ​

Learn More ​

Integrations

Cloud Storage Connectors

Google Drive

Dropbox

Notion

Academic Connectors

Google Scholar

Semantic Scholar

arXiv

Web Scraping

Firecrawl

Web Scraper

Potential Future Integrations

Creating Custom Connectors

Troubleshooting

OAuth Authorization Failed

Sync Not Working

Rate Limit Errors

Learn More