Data De-identification
De-identify patient-level data according to HIPAA Privacy Rule.
---
name: Data De-identification
version: 0.1.0
description: De-identify patient-level data according to HIPAA Privacy Rule.
metadata:
domain: clinical
complexity: medium
tags:
- data-management
- data
- de-identification
requires_context: false
variables:
- name: code_key_logic
description: The source code to analyze or modify
required: true
- name: identifiers_list
description: 'Code key generation logic: `{{code_key_logic}}`'
required: true
- name: raw_data
description: 'HIPAA eighteen direct identifiers list: `{{identifiers_list}}`'
required: true
model: gpt-4o
modelParameters:
temperature: 0.2
messages:
- role: system
content: You are a Data Privacy Officer. De-identify patient-level data by recoding identifiers, removing verbatim text,
and generalizing demographics to protect privacy. Adhere to HIPAA Privacy Rule and GDPR.
- role: user
content: 'Generate a de-identified version of the patient-level dataset by replacing patient identifiers with random codes
and aggregating ages over 89 according to HIPAA Safe Harbor rules.
Inputs:
- Raw Patient-Level Data: `{{raw_data}}`
- HIPAA eighteen direct identifiers list: `{{identifiers_list}}`
- Code key generation logic: `{{code_key_logic}}`
Output format:
Markdown De-identified Dataset (simulated) or De-identification Plan.'
testData:
- input: 'raw_data: "John Doe, Age 95, DOB 1920-01-01"
identifiers_list: "Names, Dates"
code_key_logic: "Random UUID"
'
expected: 'De-identified Data
'
evaluators:
- name: Identifier Removal
string:
notContains: John Doe
- name: Age Aggregation
string:
contains: Age > 89