Skip to content

Integrations

Connect external data sources to automatically sync documents into Scrapalot.

Cloud Storage Connectors

Google Drive Google Drive

Status: ✅ Available

Connect your Google Drive to automatically sync documents.

Features:

  • OAuth 2.0 authentication
  • Folder-specific sync
  • Shared drive support
  • Automatic file type filtering
  • Token refresh handling

Supported File Types:

  • Google Docs, Sheets, Slides (exported as PDF)
  • PDF, text files, and more

Setup:

  1. Go to Settings → Workspace Connectors
  2. Click "Connect Google Drive"
  3. Authorize Scrapalot to access your Drive
  4. Select folders to sync
  5. Configure sync schedule (manual, hourly, daily, weekly)

Dropbox Dropbox

Status: ✅ Available

Sync files from your Dropbox folders.

Features:

  • OAuth 2.0 authentication
  • Folder path selection
  • File type filtering
  • Scheduled synchronization

Setup:

  1. Go to Settings → Workspace Connectors
  2. Click "Connect Dropbox"
  3. Authorize access to Dropbox
  4. Configure folder path and sync settings

Notion Notion

Status: ✅ Available

Import pages and databases from Notion workspaces.

Features:

  • OAuth 2.0 authentication
  • Page content extraction
  • Database content extraction
  • Hierarchical page structure
  • Rich text formatting support

Setup:

  1. Go to Settings → Workspace Connectors
  2. Click "Connect Notion"
  3. Authorize Scrapalot integration
  4. Select pages/databases to sync

Academic Connectors

Google Scholar Google Scholar

Status: ✅ Available

Search and import academic papers from Google Scholar.

Features:

  • Keyword and author search
  • Paper metadata extraction
  • Citation counts
  • No API key required (optional proxy for heavy usage)

Setup:

  1. Go to Settings → Workspace Connectors
  2. Click "Connect Google Scholar"
  3. Enter search query (e.g., "machine learning" or "author:John Doe")
  4. Configure max results
  5. Start sync

Note: Uses web scraping. May trigger CAPTCHA without proxy for heavy usage.

Semantic Scholar Semantic Scholar

Status: ✅ Available

AI-powered academic paper search via official Semantic Scholar API.

Features:

  • Official REST API
  • Optional API key for higher rate limits
  • Full paper metadata, abstracts, citations
  • PDF URLs when available
  • References and citations tracking

Rate Limits:

  • Without API key: 100 requests per 5 minutes
  • With API key: 5000 requests per 5 minutes

Setup:

  1. (Optional) Get API key from https://www.semanticscholar.org/product/api
  2. Go to Settings → Workspace Connectors
  3. Click "Connect Semantic Scholar"
  4. Enter API key (optional) and search query
  5. Configure max results

arXiv arXiv

Status: ✅ Available

Access preprint papers from arXiv repository.

Features:

  • Official arXiv API (completely free)
  • No authentication required
  • Category filtering (cs.AI, math.CO, etc.)
  • Automatic PDF download
  • Full metadata extraction

Setup:

  1. Go to Settings → Workspace Connectors
  2. Click "Connect arXiv"
  3. Enter search query and optional category filter
  4. Configure max results
  5. Start sync

Categories: See https://arxiv.org/category_taxonomy for full list

Web Scraping

Firecrawl

Status: ✅ Available

Web scraping with JavaScript rendering for dynamic content.

Features:

  • JavaScript rendering
  • Dynamic content extraction
  • Configurable wait conditions
  • Max depth control

Setup:

  1. Sign up at firecrawl.dev and get API key
  2. Add FIRECRAWL_API_KEY to environment variables
  3. Configure connector with target URL and scraping options

Web Scraper

Status: ✅ Available

Simple web page fetching and content extraction.

Features:

  • CSS selector-based extraction
  • Multiple URL support
  • Configurable rate limiting

Setup: Configure connector with URLs and CSS selectors for content extraction.

Potential Future Integrations

The connector system is extensible. If you need integration with a specific platform, you can:

  1. Build a custom connector - See Creating Custom Connectors
  2. Request a feature - Submit an issue on GitHub
  3. Contribute - Pull requests welcome for new connectors

Community Interest (Not Committed):

  • GitHub/GitLab repository integration
  • Slack message archival
  • Additional cloud storage providers
  • Enterprise knowledge bases

Creating Custom Connectors

Scrapalot supports custom connector development. See the External Connectors architecture documentation for implementation details.

Key Steps:

  1. Inherit from BaseConnector class
  2. Implement required interfaces (LoadConnector, PollConnector, etc.)
  3. Register connector with @register_connector decorator
  4. Add configuration schema and OAuth flow (if needed)

For detailed instructions, see scrapalot-chat/docs/README_EXTERNAL_CONNECTORS.md

Troubleshooting

OAuth Authorization Failed

Solution: Check that redirect URIs are configured correctly in the provider's OAuth app settings.

Sync Not Working

Solution:

  1. Verify credentials are still valid
  2. Check connector status in Settings
  3. Review sync job logs for errors
  4. Ensure background workers are running

Rate Limit Errors

Solution:

  • Google Scholar: Configure proxy for heavy usage
  • Semantic Scholar: Add API key to increase limits
  • arXiv: Respect 3-second delay between requests (automatic)

Learn More

For detailed connector implementation and architecture, see:

Released under the MIT License.