LLM Abliteration
The Domain-Specific Abliteration Paradox
How abliterating cybersecurity refusal collapsed nearly every safety domain — despite near-zero vector similarity except for the one that shared the most overlap
Vadym Hadetskyi
- AI Security
- AI Safety
- LLM Abliteration
- Agentic AI
- Mechanistic Interpretability