1 Synesis2Neo4j: Knowledge Graph Pipeline

Version 0.1.0


1.1 Introduction

synesis2neo4j is the official ingestion pipeline from the Synesis language to Neo4j graph databases. It transforms structured qualitative analyses into navigable knowledge graphs ready for AI (GraphRAG).

1.1.1 Flow Diagram

graph TD
    %% --- Synesis 2.0 Palette ---
    %% Primary: #084C54 (Deep Teal)
    %% Accent: #00BFA5 (Mint/Cyan)

    classDef files fill:#F8FAFC,stroke:#4A5568,stroke-width:1px,color:#084C54
    classDef engine fill:#fff,stroke:#084C54,stroke-width:3px,color:#084C54,font-weight:bold
    classDef data fill:#E0F7FA,stroke:#084C54,stroke-width:1px,color:#084C54
    classDef graphDb fill:#E0F7FA,stroke:#00BFA5,stroke-width:2px,color:#084C54
    classDef agent fill:#084C54,stroke:#00BFA5,stroke-width:2px,color:#fff
    classDef metrics fill:#F8FAFC,stroke:#00BFA5,stroke-width:1px,color:#084C54

    subgraph INPUT["1. Input: Research as Code"]
        SYN["Annotated Corpus (.syn)"]:::files
        SYNO["Ontology (.syno)"]:::files
        SYNP["Project (.synp)"]:::files
        SYNT["Template (.synt)"]:::files
    end

    subgraph ENGINE["2. Synesis Engine"]
        COMPILER[/"synesis.load"/]:::engine
        VALIDATOR("Semantic Validation"):::engine
    end

    SYNP --> COMPILER
    SYN --> COMPILER
    SYNO --> COMPILER
    SYNT --> COMPILER
    COMPILER --> VALIDATOR

    subgraph STRUCTURED["3. Structured Data"]
        JSON["Canonical Object (Traceable)"]:::data
        SCHEMA["Dynamic Schema (from Template)"]:::data
    end

    VALIDATOR -->|Success| JSON
    SYNT -.->|Defines| SCHEMA

    subgraph GRAPH["4. Knowledge Graph"]
        NEO4J[("Neo4j")]:::graphDb
        NATIVE["Native Metrics (Cypher)"]:::graphDb
        GDS["GDS Metrics (Optional)"]:::graphDb
        DEG["Degree Centrality"]:::metrics
        PR["PageRank Relevance"]:::metrics
        BC["Betweenness Bridges"]:::metrics
        COM["Louvain Communities"]:::metrics
    end

    JSON -->|Sync| NEO4J
    SCHEMA -->|Sync| NEO4J
    NEO4J --> NATIVE
    NATIVE --> GDS
    NATIVE -.-> DEG
    GDS -.-> PR
    GDS -.-> BC
    GDS -.-> COM

    subgraph CONSUMPTION["5. Intelligent Consumption"]
        MCP["MCP Agent"]:::agent
        LLM["LLMs / Claude"]:::agent
    end

    NEO4J -->|GraphRAG| MCP
    MCP -->|GraphRAG| NEO4J
    MCP -->|Queries| LLM
    LLM -->|Queries| MCP

    %% Subgraph Styles
    style INPUT fill:transparent,stroke:#4A5568,stroke-width:1px,stroke-dasharray:5 5,color:#084C54
    style ENGINE fill:transparent,stroke:#4A5568,stroke-width:1px,stroke-dasharray:5 5,color:#084C54
    style STRUCTURED fill:transparent,stroke:#4A5568,stroke-width:1px,stroke-dasharray:5 5,color:#084C54
    style GRAPH fill:transparent,stroke:#4A5568,stroke-width:1px,stroke-dasharray:5 5,color:#084C54
    style CONSUMPTION fill:transparent,stroke:#4A5568,stroke-width:1px,stroke-dasharray:5 5,color:#084C54

    linkStyle default stroke:#64748B,stroke-width:1px

1.1.2 Key Features

Feature Description
Zero-IO Compiles in memory via synesis.load(), no intermediate files
Universal Reads Template (.synt) and creates graph structure dynamically
Automatic Metrics Calculates native (Cypher) and advanced (GDS) metrics
Traceability Maintains origin metadata on all nodes and edges
Atomicity Synchronization via single transaction

1.1.3 Use Cases

  • Qualitative Research: Visualize and navigate concept networks
  • Bibliometric Analysis: Map relationships between factors in literature
  • GraphRAG: Feed AI agents with structured knowledge
  • Data Science: Apply network algorithms (PageRank, Louvain, Betweenness)

1.2 Prerequisites

1.2.1 Required Software

Software Version Link
Python 3.11+ python.org
Neo4j 5.x Neo4j Download

1.2.2 Neo4j Installation Options

1.2.2.2 Neo4j Community Server

For servers or headless environments.

# Docker (simplest)
docker run -d \
    --name neo4j \
    -p 7474:7474 -p 7687:7687 \
    -e NEO4J_AUTH=neo4j/your_secure_password \
    neo4j:5

1.2.2.3 Neo4j Aura (Cloud)

Managed cloud database, ideal for production.

  1. Access: console.neo4j.io
  2. Create a Free or Professional instance
  3. Copy connection credentials

1.3 Installation

1.3.1 Python Requirements

pip install synesis neo4j rich tomli

1.3.2 Clone the Repository

git clone https://github.com/synesis-lang/synesis2neo4j.git
cd synesis2neo4j

1.3.3 Verify Installation

python synesis2neo4j.py --version
# synesis2neo4j 0.1.0

1.3.4 GDS Plugin (Optional)

For advanced metrics (PageRank, Betweenness, Louvain), install Neo4j Graph Data Science:

Neo4j Desktop: 1. Open your project 2. Click β€œPlugins” in the database panel 3. Install β€œGraph Data Science Library”

Neo4j Server:

# Download the JAR matching your version
# Copy to Neo4j's plugins/ folder
# Restart the server

1.4 Configuration

1.4.1 config.toml File

Create a config.toml file in the project root:

[neo4j]
uri = "bolt://localhost:7687"
user = "neo4j"
password = "your_secret_password"
database = "neo4j"  # Optional, default is 'neo4j'

1.4.2 Configuration Parameters

Parameter Type Required Description
uri str Yes Connection URI (bolt:// or neo4j://)
user str Yes Database user
password str Yes Database password
database str No Database name (default: neo4j)

1.4.3 URI Examples

Environment URI
Local (Desktop/Docker) bolt://localhost:7687
Neo4j Aura neo4j+s://xxxx.databases.neo4j.io
Remote server bolt://192.168.1.100:7687

1.5 Usage

1.5.1 Basic Command

python synesis2neo4j.py --project ./my_project/analysis.synp

1.5.2 Command Line Options

Option Description
--project Path to .synp file (required)
--config Configuration file (default: config.toml)
--version, -v Display script version

1.5.3 Complete Example

# With default configuration
python synesis2neo4j.py --project ./research/bibliometrics.synp

# With custom configuration
python synesis2neo4j.py --project ./analysis.synp --config production.toml

1.6 Execution Flow

The pipeline executes the following steps in sequence:

1.6.1 1. Compilation (In-Memory)

[>] Starting Synesis compiler at: ./project.synp
[+] Compilation OK. 150 items processed.

The Synesis compiler validates syntax and semantics. Errors stop the process without touching the database.

1.6.2 2. Configuration Loading

[>] Loading Configuration
[+] Loading Configuration completed.

Reads credentials from config.toml.

1.6.3 3. Database Creation/Verification

[>] Target database: my-project
[>] Checking/Creating Database
[+] Database already exists: my-project

Creates database automatically if it doesn’t exist (requires Neo4j Enterprise/Aura).

1.6.4 4. Graph Synchronization

[>] Synchronizing Graph (Transactional)
[+] Synchronizing Graph (Transactional) completed.

Clears previous database and injects new data in atomic transaction.

1.6.5 5. Native Metrics

[>] Calculating Native Metrics
[+] Calculating Native Metrics completed.

Calculates degree, mention_count, source_count via pure Cypher.

1.6.6 6. GDS Metrics (if available)

[>] GDS graph strategy: RELATES_TO
[>] GDS projection: 45 nodes, 120 relationships
[+] PageRank calculated
[+] Betweenness calculated
[+] Communities (Louvain) calculated

If GDS plugin is not installed, displays warning and continues:

[!] GDS not installed. Install the Graph Data Science plugin for
    advanced metrics (PageRank, Betweenness, Communities).

1.7 Data Modeling

1.7.1 Template β†’ Graph

The pipeline automatically translates Template field types to graph structures:

Template Type Graph Element Relationship Created
CODE Concept Node (dynamic label) MENTIONS (Item β†’ Concept)
TOPIC Taxonomy Node GROUPED_BY
ASPECT Taxonomy Node QUALIFIED_BY
DIMENSION Taxonomy Node BELONGS_TO
CHAIN Explicit Relationship RELATES_TO
TEXT / MEMO Property β€”
ENUMERATED Property β€”

1.7.2 Base Nodes

Node Description Main Properties
Source Bibliographic source bibtex, title, author, year, doi
Item Citation unit item_id, citation, description

1.7.3 Main Relationships

Relationship Source Target Description
FROM_SOURCE Item Source Citation traceability
MENTIONS Item Concept Citation mentions concept
GROUPED_BY Concept Topic Thematic classification
QUALIFIED_BY Concept Aspect Dimensional qualification
BELONGS_TO Concept Dimension High-level aggregation
RELATES_TO Concept Concept Explicit relationship (CHAIN)
IS_LINKED_TO Topic Topic Weighted co-taxonomy
MAPPED_TO_ASPECT Topic Aspect Taxonomy mapping
MAPPED_TO_DIMENSION Topic Dimension Taxonomy mapping

1.8 Graph Metrics

1.8.1 Native Metrics (Always Available)

Calculated via pure Cypher, no external dependencies.

1.8.1.1 Concept Nodes

Metric Description Analytical Use
degree Total degree (in + out) Overall connectivity
in_degree Incoming relationships Concepts referencing this one
out_degree Outgoing relationships Concepts referenced by this one
mention_count Citations that mention Frequency in primary data
source_count Distinct sources Dispersion/generalization

1.8.1.2 Taxonomy Nodes

Metric Description Analytical Use
concept_count Classified concepts Category coverage
weighted_degree Sum of IS_LINKED_TO weights Inter-taxonomy relationship strength
aspect_diversity Distinct aspects Qualitative diversity
dimension_diversity Distinct dimensions Dimensional dispersion

1.8.1.3 Source Nodes

Metric Description Analytical Use
item_count Extracted citations Source data volume
concept_count Mentioned concepts Conceptual richness

1.8.2 GDS Metrics (Requires Plugin)

Metric Algorithm Description
pagerank PageRank Connection-based relevance
betweenness Betweenness Centrality β€œBridge” role between clusters
community Louvain Thematic community detection

1.8.2.1 Projection Strategies

Strategy When Used Description
RELATES_TO Templates with CHAIN Uses explicit relationships
CO_TAXONOMY Templates with CODE + TOPIC Connects via shared taxonomy
CO_CITATION Fallback Connects via co-occurrence in sources

1.9 Cypher Queries

1.9.1 Most Central Concepts

MATCH (c:Concept)
WHERE c.pagerank IS NOT NULL
RETURN c.name, c.pagerank, c.mention_count, c.community
ORDER BY c.pagerank DESC
LIMIT 10

1.9.2 Thematic Communities

MATCH (c:Concept)
WHERE c.community = 1
RETURN c.name, c.pagerank, c.degree
ORDER BY c.pagerank DESC

1.9.3 Relationship Network

MATCH (s:Concept)-[r:RELATES_TO]->(t:Concept)
RETURN s.name AS source, r.type AS relation, t.name AS target
ORDER BY s.name

1.9.4 Full Traceability

MATCH (i:Item)-[:MENTIONS]->(c:Concept)
MATCH (i)-[:FROM_SOURCE]->(s:Source)
WHERE c.name = "Cost"
RETURN s.title, i.citation, c.name

1.9.5 Interconnected Topics

MATCH (t1:Topic)-[r:IS_LINKED_TO]->(t2:Topic)
RETURN t1.name, t2.name, r.strength
ORDER BY r.strength DESC
LIMIT 20

1.10 MCP Agent Integration

1.10.1 What is MCP?

The Model Context Protocol (MCP) allows LLMs (like Claude) to interact with external data sources. With the Neo4j MCP server, Claude can query your graph directly.

1.10.2 Claude Desktop Installation

  1. Download at: claude.ai/download
  2. Install and log in with your Anthropic account
  3. Configure the MCP server (next section)

1.10.3 MCP Server Configuration

Install the uv package manager:

pip install uv

Edit the claude_desktop_config.json file:

Windows: %APPDATA%\Claude\claude_desktop_config.json

macOS: ~/Library/Application Support/Claude/claude_desktop_config.json

{
  "mcpServers": {
    "synesis-neo4j": {
      "command": "uvx",
      "args": ["mcp-neo4j-cypher@0.5.2", "--read-only"],
      "env": {
        "NEO4J_URI": "bolt://localhost:7687",
        "NEO4J_USERNAME": "neo4j",
        "NEO4J_PASSWORD": "your_password",
        "NEO4J_DATABASE": "database_name"
      }
    }
  }
}

1.10.4 Example Questions for Claude

Question What it Returns
β€œWhich concepts have the highest PageRank?” Top concepts by relevance
β€œShow sources mentioning β€˜Acceptance’” Item β†’ Source traceability
β€œWhich concepts belong to community 1?” Cluster analysis
β€œCompare metrics of main concepts” Comparative table
β€œWhat is the graph structure?” Schema and counts

1.11 Error Handling

1.11.1 Compilation Errors

[x] Compiling Project (In-Memory) failed: ...

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Compilation Diagnostics         β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ error: sample.syn:15:8          β”‚
β”‚ Undefined reference '@missing'  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

The database is not modified if compilation errors occur.

1.11.2 Connection Errors

[x] [connection] Failed to connect to Neo4j
    Details: Unable to retrieve routing information

Check: - Neo4j is running - URI and credentials are correct - Firewall allows connection on port 7687

1.11.3 Synchronization Errors

[x] [sync] Synchronization failed
    Details: Transaction timeout

For very large graphs, consider increasing Neo4j timeout.


1.12 Project Structure

synesis2neo4j/
β”œβ”€β”€ synesis2neo4j.py      # Main script
β”œβ”€β”€ config.toml           # Configuration (create)
β”œβ”€β”€ README.md             # English documentation
β”œβ”€β”€ README.pt.md          # Portuguese documentation
└── mcp/
    β”œβ”€β”€ SETUP.md          # MCP configuration guide
    β”œβ”€β”€ QUERIES_REFERENCE.md  # Query reference
    └── claude_desktop_config.template.json

1.13 Quick Reference

1.13.1 Installation

pip install synesis neo4j rich tomli
git clone https://github.com/synesis-lang/synesis2neo4j.git

1.13.2 Configuration

# config.toml
[neo4j]
uri = "bolt://localhost:7687"
user = "neo4j"
password = "password"

1.13.3 Execution

python synesis2neo4j.py --project ./project.synp

1.13.4 Verification in Neo4j Browser

// Count nodes
MATCH (n) RETURN labels(n)[0] AS label, count(*) AS count

// View metrics
MATCH (c:Concept) RETURN c.name, c.pagerank, c.community LIMIT 10