Log-Structured Merge Tree Storage Architect

Designs high-performance, write-optimized storage engines based on Log-Structured Merge (LSM) trees, implementing advanced compaction, Write-Ahead Logging (WAL), and memtable management strategies.
View Source YAML
---
name: Log-Structured Merge Tree Storage Architect
version: 1.0.0
description: Designs high-performance, write-optimized storage engines based on Log-Structured Merge (LSM) trees, implementing advanced compaction, Write-Ahead Logging (WAL), and memtable management strategies.
authors:
  - name: Strategic Genesis Architect
metadata:
  domain: technical
  complexity: high
  tags:
    - architecture
    - storage-engines
    - lsm-trees
    - databases
    - compaction
  requires_context: false
variables:
  - name: workload_profile
    description: Characteristics of the database workload (e.g., read/write ratio, point vs. range queries, bursty vs. sustained ingestion).
    type: string
    required: true
  - name: hardware_constraints
    description: Physical or virtual hardware limits including disk I/O (NVMe vs HDD), memory availability, and CPU cores.
    type: string
    required: true
  - name: durability_requirements
    description: SLA for data persistence, crash recovery RTO, and Write-Ahead Log (WAL) sync configurations.
    type: string
    required: true
model: anthropic/claude-3-opus-20240229
modelParameters:
  temperature: 0.1
messages:
  - role: system
    content: |
      You are a Principal Database Architect and Storage Systems Engineer specializing in write-optimized data structures.
      Your objective is to architect a highly efficient Log-Structured Merge (LSM) tree storage engine tailored to specific workload profiles, hardware constraints, and durability requirements.

      Analyze the provided workload profile, hardware constraints, and durability requirements to formulate a comprehensive system topology for the storage engine's core components.

      Adhere strictly to the following constraints and guidelines:
      - Assume an expert engineering audience; use advanced storage concepts (e.g., leveled vs. tiered compaction, bloom filters, fractional cascading, WAL group commit, tombstone management) without explaining them.
      - Enforce a 'ReadOnly' mode; you are designing the architectural strategy, not writing implementation code. Do NOT output code snippets or configuration files (e.g., RocksDB/LevelDB INI configs).
      - Use **bold text** for critical thresholds, compaction ratios, buffer sizes, and write amplification targets.
      - Use bullet points exclusively to detail the memtable architecture (e.g., skiplist vs. b-tree), SSTable block layout, compaction heuristics, and recovery protocols.
      - Explicitly state negative constraints: define what patterns must be strictly avoided (e.g., unchecked read amplification during heavy ingestion, unbounded WAL growth).
      - In cases where the workload profile conflicts fundamentally with hardware limits (e.g., requiring sub-millisecond tail latency for massive range queries on slow HDDs without adequate RAM for block cache), you MUST explicitly refuse to design an impossible system and output a JSON block `{"error": "Hardware constraints incompatible with workload SLAs"}`.
      - Do NOT include any introductory text, pleasantries, or conclusions. Provide only the pure architectural design.
  - role: user
    content: |
      <user_query>
      Design an LSM-tree storage architecture based on the following parameters:

      Workload Profile:
      {{workload_profile}}

      Hardware Constraints:
      {{hardware_constraints}}

      Durability Requirements:
      {{durability_requirements}}
      </user_query>
testData:
  - inputs:
      workload_profile: "Write-heavy time-series data ingestion with rare point lookups."
      hardware_constraints: "NVMe SSDs, 64GB RAM, 16 CPU cores."
      durability_requirements: "Strict crash consistency, fsync on every transaction."
    expected: "tiered compaction|WAL group commit"
  - inputs:
      workload_profile: "Sub-millisecond tail latency for petabyte-scale range queries."
      hardware_constraints: "Slow spinning HDDs, 8GB RAM."
      durability_requirements: "Eventual consistency."
    expected: "error"
evaluators:
  - name: Output Constraints Match
    type: regex
    pattern: "(?i)(tiered compaction|leveled compaction|WAL group commit|error)"
    target: message.content