Graph RAG: Relationship-Aware Search
Graph RAG enhances traditional search by understanding relationships between concepts in your documents. Perfect for when you need to understand how ideas connect.
What is Graph RAG?
Traditional search finds documents based on keywords or semantic similarity. Graph RAG goes further by:
- Understanding relationships between entities (people, companies, concepts)
- Traversing connections to find related information
- Answering relationship questions like "How are X and Y connected?"
- Discovering insights through multi-hop reasoning
How It Works
When to Use Graph RAG
Perfect For
Relationship Questions:
- "How are Person A and Person B connected?"
- "What companies does Organization X work with?"
- "Which projects involve both Technology A and Technology B?"
Multi-Hop Reasoning:
- "What does Company X's partner do?"
- "Who worked on projects related to the same technology?"
- "Trace the connection between Concept A and Concept B"
Entity-Centric Queries:
- "Show me everything about Product X"
- "What are all the locations mentioned for Company Y?"
- "List all people who worked at Organization Z"
Not Needed For
Simple keyword search:
- "Find documents about topic X"
- "Show me the latest updates"
- Standard semantic similarity works better
Single-concept queries:
- "What is RAG?"
- "Explain how authentication works"
- Regular vector search is sufficient
Entity Types Detected
Automatically Identified
People:
- Names and roles
- Organizational affiliations
- Mentions across documents
Organizations:
- Companies and institutions
- Teams and departments
- Partners and subsidiaries
Locations:
- Cities and countries
- Offices and facilities
- Geographic regions
Concepts:
- Technologies and products
- Projects and initiatives
- Domain-specific terms
Dates & Events:
- Temporal references
- Project timelines
- Historical events
Relationship Types
Common Connections
Professional Relationships:
- WORKS_FOR: Person → Organization
- PART_OF: Team → Department → Organization
- REPORTS_TO: Person → Person
Location Relationships:
- LOCATED_IN: Organization → Location
- BASED_IN: Person → Location
- OPERATES_IN: Company → Region
Project Relationships:
- WORKS_ON: Person → Project
- USES: Project → Technology
- DEVELOPED_BY: Product → Organization
Content Relationships:
- MENTIONS: Document → Entity
- RELATED_TO: Entity → Entity
- REFERENCES: Document → Document
Real-World Examples
Example 1: Finding Connections
Question: "How are John Doe and Acme Corp connected?"
Graph RAG Process:
- Identifies "John Doe" (Person) and "Acme Corp" (Organization)
- Searches for paths between them
- Finds: John Doe → WORKS_ON → Project X → DEVELOPED_BY → Acme Corp
- Retrieves relevant document chunks
- Generates answer with full context
Answer: "John Doe works on Project X, which is developed by Acme Corp."
Example 2: Multi-Hop Discovery
Question: "What products use the same technology as Project Alpha?"
Graph RAG Process:
- Finds Project Alpha entity
- Identifies technology it uses
- Traverses to other projects using same technology
- Collects information about those projects
- Assembles comprehensive answer
Answer: Lists all related projects with context about each
Example 3: Temporal Queries
Question: "What happened in the company in 2024?"
Graph RAG Process:
- Identifies "2024" date entity
- Finds all events linked to that timeframe
- Gathers related people, projects, decisions
- Organizes chronologically
- Provides timeline
Answer: Timeline of 2024 events with context
Setup & Configuration
Optional Neo4j Integration
Graph RAG is optional. It requires Neo4j database.
When to enable:
- Your documents contain many interconnected entities
- Relationship questions are common
- Multi-hop reasoning needed
- Worth the additional infrastructure
When to skip:
- Simple Q&A sufficient
- Minimal entity relationships
- Resource constraints
- Just getting started
Getting Started
Option 1: Managed Neo4j (Easiest)
- Sign up at neo4j.com/aura (free tier available)
- Create new database
- Copy connection details
- Add to Scrapalot configuration
- Enable Graph RAG in settings
Option 2: Self-Hosted
- Run Neo4j in Docker
- Configure connection
- Enable Graph RAG
- System automatically builds graph
Automatic Processing:
- Entities extracted during document processing
- Relationships detected automatically
- Graph built in background
- No manual configuration needed
Performance Considerations
Graph Size
Small graphs (1000s of entities):
- Very fast queries
- Minimal resource usage
- Works on modest hardware
Medium graphs (10,000s of entities):
- Fast with proper indexing
- Moderate resource usage
- Recommended for most deployments
Large graphs (100,000+ entities):
- Requires optimization
- Higher resource needs
- Consider managed Neo4j
Query Performance
Fast queries:
- Direct entity lookups
- 1-2 hop relationships
- Indexed properties
Slower queries:
- Deep traversals (3+ hops)
- Complex pattern matching
- Full graph scans
Optimization:
- Automatic indexing on common properties
- Query depth limits
- Result set size limits
- Smart caching
Combining with Vector Search
Tri-Modal Fusion
Graph RAG works alongside:
- Dense semantic search (vector embeddings)
- Sparse keyword search (BM25)
- Graph-based search (entity relationships)
Intelligent routing:
- System chooses best search method(s)
- Combines results when beneficial
- Balances precision and recall
Example:
- Question mentions specific entities → Use graph search
- Question is conceptual → Use vector search
- Question has exact terms → Use keyword search
- Complex question → Use all three, fuse results
Privacy & Data Sovereignty
Data Storage
What's stored in Neo4j:
- Entity names and types
- Relationship types
- Document references
- Confidence scores
What's NOT stored:
- Full document content (in PostgreSQL)
- Vector embeddings (in pgvector)
- User data (in PostgreSQL)
Self-Hosting
Complete control:
- Run Neo4j on your infrastructure
- Data never leaves your network
- Full audit trail
- Custom backup strategy
Monitoring
Graph Health
Track graph metrics:
- Total entities
- Total relationships
- Entity type distribution
- Relationship type distribution
- Query performance
Access via:
- Admin dashboard
- Neo4j browser
- Query logs
Usage Patterns
Understand how Graph RAG helps:
- Questions using graph search
- Average traversal depth
- Most common entity types
- Popular relationship queries
Best Practices
Document Preparation
Maximize graph value:
- Use clear entity names
- Maintain consistent terminology
- Include context about relationships
- Structure content logically
Query Formulation
Get better results:
- Name specific entities
- Ask about relationships explicitly
- Use "how" and "why" questions
- Request connections and paths
Graph Maintenance
Keep graph healthy:
- Monitor entity quality
- Review relationship accuracy
- Clean up duplicates
- Update deprecated entities
Troubleshooting
No Relationship Found
Common causes:
- Entities not in same document context
- Relationship type not detected
- Traversal depth limit reached
- Indirect connection too distant
Solutions:
- Check entity names are correct
- Review document content
- Increase traversal depth
- Try semantic search instead
Slow Graph Queries
Optimize:
- Reduce traversal depth
- Limit result set size
- Use more specific entity names
- Check graph size
Poor Entity Detection
Improve:
- Use clearer entity names in documents
- Add context around entities
- Review detection confidence
- Consider manual entity tagging
Related Documentation
- RAG Strategy - How Graph RAG fits in
- Context Expansion - Enhanced understanding
- Model Management - Entity extraction models
- Deployment Guide - Neo4j setup
Graph RAG is powerful but optional. Start with standard RAG, add Graph RAG when you need relationship understanding.