Skip to content

Graph RAG: Relationship-Aware Search

Graph RAG enhances traditional search by understanding relationships between concepts in your documents. Perfect for when you need to understand how ideas connect.

What is Graph RAG?

Traditional search finds documents based on keywords or semantic similarity. Graph RAG goes further by:

  • Understanding relationships between entities (people, companies, concepts)
  • Traversing connections to find related information
  • Answering relationship questions like "How are X and Y connected?"
  • Discovering insights through multi-hop reasoning

How It Works

When to Use Graph RAG

Perfect For

Relationship Questions:

  • "How are Person A and Person B connected?"
  • "What companies does Organization X work with?"
  • "Which projects involve both Technology A and Technology B?"

Multi-Hop Reasoning:

  • "What does Company X's partner do?"
  • "Who worked on projects related to the same technology?"
  • "Trace the connection between Concept A and Concept B"

Entity-Centric Queries:

  • "Show me everything about Product X"
  • "What are all the locations mentioned for Company Y?"
  • "List all people who worked at Organization Z"

Not Needed For

Simple keyword search:

  • "Find documents about topic X"
  • "Show me the latest updates"
  • Standard semantic similarity works better

Single-concept queries:

  • "What is RAG?"
  • "Explain how authentication works"
  • Regular vector search is sufficient

Entity Types Detected

Automatically Identified

People:

  • Names and roles
  • Organizational affiliations
  • Mentions across documents

Organizations:

  • Companies and institutions
  • Teams and departments
  • Partners and subsidiaries

Locations:

  • Cities and countries
  • Offices and facilities
  • Geographic regions

Concepts:

  • Technologies and products
  • Projects and initiatives
  • Domain-specific terms

Dates & Events:

  • Temporal references
  • Project timelines
  • Historical events

Relationship Types

Common Connections

Professional Relationships:

  • WORKS_FOR: Person → Organization
  • PART_OF: Team → Department → Organization
  • REPORTS_TO: Person → Person

Location Relationships:

  • LOCATED_IN: Organization → Location
  • BASED_IN: Person → Location
  • OPERATES_IN: Company → Region

Project Relationships:

  • WORKS_ON: Person → Project
  • USES: Project → Technology
  • DEVELOPED_BY: Product → Organization

Content Relationships:

  • MENTIONS: Document → Entity
  • RELATED_TO: Entity → Entity
  • REFERENCES: Document → Document

Real-World Examples

Example 1: Finding Connections

Question: "How are John Doe and Acme Corp connected?"

Graph RAG Process:

  1. Identifies "John Doe" (Person) and "Acme Corp" (Organization)
  2. Searches for paths between them
  3. Finds: John Doe → WORKS_ON → Project X → DEVELOPED_BY → Acme Corp
  4. Retrieves relevant document chunks
  5. Generates answer with full context

Answer: "John Doe works on Project X, which is developed by Acme Corp."

Example 2: Multi-Hop Discovery

Question: "What products use the same technology as Project Alpha?"

Graph RAG Process:

  1. Finds Project Alpha entity
  2. Identifies technology it uses
  3. Traverses to other projects using same technology
  4. Collects information about those projects
  5. Assembles comprehensive answer

Answer: Lists all related projects with context about each

Example 3: Temporal Queries

Question: "What happened in the company in 2024?"

Graph RAG Process:

  1. Identifies "2024" date entity
  2. Finds all events linked to that timeframe
  3. Gathers related people, projects, decisions
  4. Organizes chronologically
  5. Provides timeline

Answer: Timeline of 2024 events with context

Setup & Configuration

Optional Neo4j Integration

Graph RAG is optional. It requires Neo4j database.

When to enable:

  • Your documents contain many interconnected entities
  • Relationship questions are common
  • Multi-hop reasoning needed
  • Worth the additional infrastructure

When to skip:

  • Simple Q&A sufficient
  • Minimal entity relationships
  • Resource constraints
  • Just getting started

Getting Started

Option 1: Managed Neo4j (Easiest)

  1. Sign up at neo4j.com/aura (free tier available)
  2. Create new database
  3. Copy connection details
  4. Add to Scrapalot configuration
  5. Enable Graph RAG in settings

Option 2: Self-Hosted

  1. Run Neo4j in Docker
  2. Configure connection
  3. Enable Graph RAG
  4. System automatically builds graph

Automatic Processing:

  • Entities extracted during document processing
  • Relationships detected automatically
  • Graph built in background
  • No manual configuration needed

Performance Considerations

Graph Size

Small graphs (1000s of entities):

  • Very fast queries
  • Minimal resource usage
  • Works on modest hardware

Medium graphs (10,000s of entities):

  • Fast with proper indexing
  • Moderate resource usage
  • Recommended for most deployments

Large graphs (100,000+ entities):

  • Requires optimization
  • Higher resource needs
  • Consider managed Neo4j

Query Performance

Fast queries:

  • Direct entity lookups
  • 1-2 hop relationships
  • Indexed properties

Slower queries:

  • Deep traversals (3+ hops)
  • Complex pattern matching
  • Full graph scans

Optimization:

  • Automatic indexing on common properties
  • Query depth limits
  • Result set size limits
  • Smart caching

Tri-Modal Fusion

Graph RAG works alongside:

  • Dense semantic search (vector embeddings)
  • Sparse keyword search (BM25)
  • Graph-based search (entity relationships)

Intelligent routing:

  • System chooses best search method(s)
  • Combines results when beneficial
  • Balances precision and recall

Example:

  • Question mentions specific entities → Use graph search
  • Question is conceptual → Use vector search
  • Question has exact terms → Use keyword search
  • Complex question → Use all three, fuse results

Privacy & Data Sovereignty

Data Storage

What's stored in Neo4j:

  • Entity names and types
  • Relationship types
  • Document references
  • Confidence scores

What's NOT stored:

  • Full document content (in PostgreSQL)
  • Vector embeddings (in pgvector)
  • User data (in PostgreSQL)

Self-Hosting

Complete control:

  • Run Neo4j on your infrastructure
  • Data never leaves your network
  • Full audit trail
  • Custom backup strategy

Monitoring

Graph Health

Track graph metrics:

  • Total entities
  • Total relationships
  • Entity type distribution
  • Relationship type distribution
  • Query performance

Access via:

  • Admin dashboard
  • Neo4j browser
  • Query logs

Usage Patterns

Understand how Graph RAG helps:

  • Questions using graph search
  • Average traversal depth
  • Most common entity types
  • Popular relationship queries

Best Practices

Document Preparation

Maximize graph value:

  • Use clear entity names
  • Maintain consistent terminology
  • Include context about relationships
  • Structure content logically

Query Formulation

Get better results:

  • Name specific entities
  • Ask about relationships explicitly
  • Use "how" and "why" questions
  • Request connections and paths

Graph Maintenance

Keep graph healthy:

  • Monitor entity quality
  • Review relationship accuracy
  • Clean up duplicates
  • Update deprecated entities

Troubleshooting

No Relationship Found

Common causes:

  • Entities not in same document context
  • Relationship type not detected
  • Traversal depth limit reached
  • Indirect connection too distant

Solutions:

  • Check entity names are correct
  • Review document content
  • Increase traversal depth
  • Try semantic search instead

Slow Graph Queries

Optimize:

  • Reduce traversal depth
  • Limit result set size
  • Use more specific entity names
  • Check graph size

Poor Entity Detection

Improve:

  • Use clearer entity names in documents
  • Add context around entities
  • Review detection confidence
  • Consider manual entity tagging

Graph RAG is powerful but optional. Start with standard RAG, add Graph RAG when you need relationship understanding.

Released under the MIT License.