Integrations
Connect external data sources to automatically sync documents into Scrapalot.
Cloud Storage Connectors
Google Drive
Status: ✅ Available
Connect your Google Drive to automatically sync documents.
Features:
- OAuth 2.0 authentication
- Folder-specific sync
- Shared drive support
- Automatic file type filtering
- Token refresh handling
Supported File Types:
- Google Docs, Sheets, Slides (exported as PDF)
- PDF, text files, and more
Setup:
- Go to Settings → Workspace Connectors
- Click "Connect Google Drive"
- Authorize Scrapalot to access your Drive
- Select folders to sync
- Configure sync schedule (manual, hourly, daily, weekly)
Dropbox
Status: ✅ Available
Sync files from your Dropbox folders.
Features:
- OAuth 2.0 authentication
- Folder path selection
- File type filtering
- Scheduled synchronization
Setup:
- Go to Settings → Workspace Connectors
- Click "Connect Dropbox"
- Authorize access to Dropbox
- Configure folder path and sync settings
Notion
Status: ✅ Available
Import pages and databases from Notion workspaces.
Features:
- OAuth 2.0 authentication
- Page content extraction
- Database content extraction
- Hierarchical page structure
- Rich text formatting support
Setup:
- Go to Settings → Workspace Connectors
- Click "Connect Notion"
- Authorize Scrapalot integration
- Select pages/databases to sync
Academic Connectors
Google Scholar
Status: ✅ Available
Search and import academic papers from Google Scholar.
Features:
- Keyword and author search
- Paper metadata extraction
- Citation counts
- No API key required (optional proxy for heavy usage)
Setup:
- Go to Settings → Workspace Connectors
- Click "Connect Google Scholar"
- Enter search query (e.g., "machine learning" or "author:John Doe")
- Configure max results
- Start sync
Note: Uses web scraping. May trigger CAPTCHA without proxy for heavy usage.
Semantic Scholar
Status: ✅ Available
AI-powered academic paper search via official Semantic Scholar API.
Features:
- Official REST API
- Optional API key for higher rate limits
- Full paper metadata, abstracts, citations
- PDF URLs when available
- References and citations tracking
Rate Limits:
- Without API key: 100 requests per 5 minutes
- With API key: 5000 requests per 5 minutes
Setup:
- (Optional) Get API key from https://www.semanticscholar.org/product/api
- Go to Settings → Workspace Connectors
- Click "Connect Semantic Scholar"
- Enter API key (optional) and search query
- Configure max results
arXiv
Status: ✅ Available
Access preprint papers from arXiv repository.
Features:
- Official arXiv API (completely free)
- No authentication required
- Category filtering (cs.AI, math.CO, etc.)
- Automatic PDF download
- Full metadata extraction
Setup:
- Go to Settings → Workspace Connectors
- Click "Connect arXiv"
- Enter search query and optional category filter
- Configure max results
- Start sync
Categories: See https://arxiv.org/category_taxonomy for full list
Web Scraping
Firecrawl
Status: ✅ Available
Web scraping with JavaScript rendering for dynamic content.
Features:
- JavaScript rendering
- Dynamic content extraction
- Configurable wait conditions
- Max depth control
Setup:
- Sign up at firecrawl.dev and get API key
- Add
FIRECRAWL_API_KEYto environment variables - Configure connector with target URL and scraping options
Web Scraper
Status: ✅ Available
Simple web page fetching and content extraction.
Features:
- CSS selector-based extraction
- Multiple URL support
- Configurable rate limiting
Setup: Configure connector with URLs and CSS selectors for content extraction.
Potential Future Integrations
The connector system is extensible. If you need integration with a specific platform, you can:
- Build a custom connector - See Creating Custom Connectors
- Request a feature - Submit an issue on GitHub
- Contribute - Pull requests welcome for new connectors
Community Interest (Not Committed):
- GitHub/GitLab repository integration
- Slack message archival
- Additional cloud storage providers
- Enterprise knowledge bases
Creating Custom Connectors
Scrapalot supports custom connector development. See the External Connectors architecture documentation for implementation details.
Key Steps:
- Inherit from
BaseConnectorclass - Implement required interfaces (
LoadConnector,PollConnector, etc.) - Register connector with
@register_connectordecorator - Add configuration schema and OAuth flow (if needed)
For detailed instructions, see scrapalot-chat/docs/README_EXTERNAL_CONNECTORS.md
Troubleshooting
OAuth Authorization Failed
Solution: Check that redirect URIs are configured correctly in the provider's OAuth app settings.
Sync Not Working
Solution:
- Verify credentials are still valid
- Check connector status in Settings
- Review sync job logs for errors
- Ensure background workers are running
Rate Limit Errors
Solution:
- Google Scholar: Configure proxy for heavy usage
- Semantic Scholar: Add API key to increase limits
- arXiv: Respect 3-second delay between requests (automatic)
Learn More
For detailed connector implementation and architecture, see: