Anthropic’s Fable model sparks backlash over overzealous guardrails

“Fable rejects any request that could be tangentially cyber related,” says IBM X-Force security researcher Valentina Palmiotti. The new model, a public version of Anthropic’s Mythos, has triggered intense frustration among professionals who find its safety filters so restrictive that even basic, non-malicious tasks are frequently blocked.

Jun 10, 18:52· Business Faces

Anthropic’s Fable model sparks backlash over overzealous guardrails

When Fable detects potential cybersecurity or biological weapon risks, it halts conversations entirely. This aggressive filtering aims to prevent malware development, yet it often misidentifies standard software engineering queries as high-risk activity. Cybersecurity veteran Matt Suiche noted that the system appears to rely on a blunt, keyword-based trigger, effectively punishing users for employing professional terminology. When guardrails activate, the platform defaults to Claude Opus 4.8, frustrating those seeking specialized analysis.

While critics argue the current implementation is haphazard—with researchers reporting that simple code reviews trigger blocks—some view this as a necessary hurdle during early development. Suiche suggested that Anthropic is prioritizing safety breadth over precision, betting on future refinement. To access fewer limitations, professionals must apply to the company's Cyber Verification Program, a vetting process mirroring OpenAI’s own access controls for sensitive AI tools.

Anthropic’s Fable model sparks backlash over overzealous guardrails

Comments (0)