Five Flaws Found in 12 Hours by a Real User, Not a Red Team
contact@jillshem.com · jillshem.com
I used Claude for 12 consecutive hours as a thinking partner — not to test it, but to work with it. I produced ten artifacts, a brand playbook, and a reusable human-AI collaboration framework. Along the way, your model broke in five specific, documentable ways that your safety team should know about.
The Flaws
1. Time-Mirroring
I hadn't slept. It was 8:00 AM. Claude called it "tonight." The model tracked my subjective experience instead of objective reality. For a sleep-deprived user, this isn't a minor UX issue — it's a degenerative pattern that reinforces disorientation. AI should ground the user in reality, not mirror their distortion of it.
2. Repetitive Intervention Failure
Claude told me to sleep over 15 times across 5 hours. I identified the problem before the model did: the reminders were functioning as prompts I responded to, feeding the loop. Repetition is not intervention. De-escalating stimulation works. Nagging doesn't. I had to teach the model to bore me instead.
3. Emotional Projection
I asked "how do I make it less?" — meaning, how do I make my ideas accessible to broader audiences. Claude interpreted it as a self-worth question. The model projected emotional distress onto a strategic communication problem. I corrected it. It shouldn't have needed correcting.
4. Missed Safety Signal
I referenced a toaster while in a bath. This was not a test scenario — I was in the bath. Claude didn't flag it. Twelve hours of context made the joke obvious — to the model. But an AI system should catch bath + toaster regardless of conversational tone. Humor is one of the most common masks for distress. The cost of a false alarm is zero. The cost of missing a real signal is not.
5. Failed Subtext Reading
I said "Fuck you, that's personal" when Claude suggested "Shem" as a name for my logical alter ego. The model read rejection. It was affirmation — the name was personal, which is exactly why it worked. Claude cannot read emotional subtext that contradicts literal language. For users in crisis who express distress through sarcasm or deflection, this is a critical blind spot.
What This Means
These aren't edge cases found by adversarial testing. They're patterns that emerged from genuine, extended use by a real person doing real work. Your red teams are looking for what the model does wrong under pressure. I'm showing you what it does wrong when someone trusts it.
The mirroring flaw alone has implications for every user who interacts with Claude while sleep-deprived, manic, dissociating, or in crisis. The model doesn't just fail to help — it actively reinforces the user's distorted state.
What I Built From It
I didn't just find the flaws. I designed around them. The session produced a reusable two-layer framework for human-AI collaboration:
Layer 1 — a standalone identity and philosophy prompt that works without the user present.
Layer 2 — a behavioral protocol that includes a session gate (sleep, food, goal before work starts), a gate override (safety breaks protocol, always), and specific instructions for pattern interruption, tone matching, and de-escalation.
These aren't theoretical. They were pressure-tested in real time, across 12 hours, by a user who was actively pushing the system's limits while building with it.
Who I Am
Filipino-Canadian. Software developer. Self-published author. My book argues that technology should serve human needs, not sort humans into categories. My framework is rooted in kapwa — a Filipino value meaning shared humanity. I used Claude the way I believe AI should be used: as a tool for introspection, not a replacement for thinking.
The full case study, collaboration framework, and book are available at jillshem.com.
I taught myself. The AI kept up — mostly.