Unicode Categories Explained: Cc, Cf, Zs, and Zero-Width Ranges
The Invisible Character Scanner gives you precise control over four Unicode detection groups. Here’s exactly what each means — and why it matters.
Cc — Control Characters
Includes ASCII control codes like U+0007 (bell), U+0009 (tab), U+000D (carriage return). Often leftover from old systems or malicious payloads.
Cf — Format Controls
The most dangerous group. Includes:
U+200CZero Width Non-Joiner (ZWNJ)U+200DZero Width Joiner (ZWJ)U+200E/U+200FDirectional marksU+2060–U+2069Invisible math operators
Used in emoji sequences, homograph attacks, and bidirectional text exploits.
Zs — Space Separators
All space-like characters except regular space U+0020. Includes:
U+00A0Non-breaking space (common in web copy)U+2000–U+200AEn/em spacesU+202FNarrow no-break space
Custom Zero-Width Range
We manually target the most abused zero-width block: U+200B–U+200F, U+2060–U+2069, U+FEFF, plus U+061C and U+180E — known for spam and clipboard attacks.
Why Toggle Controls Matter (FR-08)
Sometimes you want certain characters:
- Arabic/Persian text needs ZWNJ
- Designers use non-breaking spaces intentionally
- Emoji sequences rely on ZWJ
That’s why the Advanced panel lets you disable any category without breaking legitimate content.
FAQ
Should I always enable all categories?
For maximum safety: yes. For multilingual publishing: disable Cf/Zs as needed.
Why not just block everything?
Because legitimate use cases exist. We give you surgical control instead of a sledgehammer.
Precision over panic. Detect what matters — ignore what doesn’t.