De-identification
Removing all identifying information from clinical data before it enters any AI tool. This is not optional. Under HIPAA, 18 categories of identifiers must be stripped: names, dates, locations, medical record numbers, and more. If you paste a client's eval into ChatGPT with their name attached, you have created a reportable breach.
The process of removing or transforming personally identifiable information (PII) and protected health information (PHI) from datasets, typically following the HIPAA Safe Harbor method (removal of 18 identifier types) or Expert Determination method. Automated de-identification tools use NER (named entity recognition) models to detect and redact identifying elements.
Why SLPs Need to Know This
Every major AI tool (ChatGPT, Claude, Gemini) processes your input on external servers. Unless you have a signed BAA and a HIPAA-compliant enterprise agreement, any client data you enter is potentially exposed. De-identification is the non-negotiable first step before clinical data touches any AI tool.
The 18 HIPAA Identifiers
Strip all of these before entering data into any AI system:
- Names
- Geographic data smaller than a state
- All dates (except year) related to an individual
- Phone numbers
- Fax numbers
- Email addresses
- Social Security numbers
- Medical record numbers
- Health plan beneficiary numbers
- Account numbers
- Certificate/license numbers
- Vehicle identifiers and serial numbers
- Device identifiers and serial numbers
- Web URLs
- IP addresses
- Biometric identifiers
- Full-face photographs
- Any other unique identifying number or code
Practical Guide
- Replace, don’t just delete. Use placeholders like [CLIENT] or [DOB] so the text remains usable
- Watch for indirect identifiers. A rare diagnosis plus an age plus a school district can identify a child even without a name
- Automate where possible. Manual de-identification is error-prone under time pressure
- Check your output too. If the model generates a response that somehow includes identifying information you provided, that output is also a risk
Related Terms
- Guardrails: safety constraints that may include automatic PHI detection
- HIPAA: the federal law governing health information privacy