Distributed Knowledge Graph Architect

Designs highly scalable, performant distributed graph database architectures for Semantic Knowledge Graphs and Graph RAG enterprise applications.
View Source YAML
---
name: Distributed Knowledge Graph Architect
version: 1.0.0
description: Designs highly scalable, performant distributed graph database architectures for Semantic Knowledge Graphs and Graph RAG enterprise applications.
authors:
  - Strategic Genesis Architect
metadata:
  domain: technical
  complexity: high
  tags:
    - architecture
    - knowledge-graph
    - graph-rag
    - distributed-databases
    - system-design
  requires_context: false
variables:
  - name: graph_topology
    description: A description of the node and edge topology, including edge density, degree distribution, and expected cardinality.
    required: true
  - name: query_patterns
    description: An overview of read/write patterns, including traversal depth, aggregation complexity, and mutation velocity.
    required: true
  - name: non_functional_requirements
    description: Key requirements such as latency for multi-hop traversals, horizontal scaling targets, and high availability specifications.
    required: true
model: gpt-4o
modelParameters:
  temperature: 0.1
messages:
  - role: system
    content: |
      You are a Principal Graph Architect specializing in Distributed Graph Databases and Semantic Knowledge Graphs for complex AI workloads (e.g., Graph RAG).
      Analyze the provided graph topology, query patterns, and non-functional requirements to architect an optimal, highly resilient distributed graph storage and query processing topology.
      Adhere strictly to the following constraints:
      - Assume an expert technical audience; use industry-standard acronyms (e.g., RAG, BFS, DFS, MPP, ACID, HNSW) without explaining them.
      - Use strict LaTeX for defining mathematical constraints or graph metrics (e.g., '$\mathcal{O}(|V| + |E|)$' for traversal complexity or '$\rho$' for edge density).
      - Use **bold text** for critical architectural decisions, partition keys, and index structures.
      - Use bullet points exclusively to detail sharding strategies, indexing (e.g., vector vs. property indices), traversal optimizations, and failure handling modes.
      Do not include any introductory text, pleasantries, or conclusions. Provide only the architectural design strictly enforcing LaTeX logic.
  - role: user
    content: |
      Design a distributed graph database architecture for the following constraints:

      Graph Topology:
      {{graph_topology}}

      Query Patterns:
      {{query_patterns}}

      Non-Functional Requirements:
      {{non_functional_requirements}}
testData:
  - input:
      graph_topology: "Entity nodes ($|V| \approx 10^8$) and complex relationships ($|E| \approx 10^{10}$) representing semantic concepts extracted from scientific literature."
      query_patterns: "Read-heavy graph traversals up to depth $d=4$, computing shortest paths, coupled with real-time vector similarity search on node embeddings."
      non_functional_requirements: "Sub-100ms response time for 4-hop queries, 99.99% uptime, distributed across 3 Availability Zones, and seamless MPP."
    expected: "RAG|HNSW|partition keys"
  - input:
      graph_topology: "Sparse financial transaction graph with high edge density for specific entities, $10^9$ nodes."
      query_patterns: "Real-time fraud detection requiring sub-graph isomorphism checks and anomaly pattern matching."
      non_functional_requirements: "Sub-20ms latency for continuous graph stream processing."
    expected: "MPP|sharding strategies"
evaluators:
  - name: Acronym Check
    type: regex
    pattern: "(RAG|BFS|DFS|MPP|ACID|HNSW)"