Unicode Categories Explained: Cc, Cf, Zs, and Zero-Width Ranges

The Invisible Character Scanner gives you precise control over four Unicode detection groups. Here’s exactly what each means — and why it matters.

Cc — Control Characters

Includes ASCII control codes like U+0007 (bell), U+0009 (tab), U+000D (carriage return). Often leftover from old systems or malicious payloads.

Cf — Format Controls

The most dangerous group. Includes:

  • U+200C Zero Width Non-Joiner (ZWNJ)
  • U+200D Zero Width Joiner (ZWJ)
  • U+200E/U+200F Directional marks
  • U+2060–U+2069 Invisible math operators

Used in emoji sequences, homograph attacks, and bidirectional text exploits.

Zs — Space Separators

All space-like characters except regular space U+0020. Includes:

  • U+00A0 Non-breaking space (common in web copy)
  • U+2000–U+200A En/em spaces
  • U+202F Narrow no-break space

Custom Zero-Width Range

We manually target the most abused zero-width block: U+200B–U+200F, U+2060–U+2069, U+FEFF, plus U+061C and U+180E — known for spam and clipboard attacks.

Why Toggle Controls Matter (FR-08)

Sometimes you want certain characters:

  • Arabic/Persian text needs ZWNJ
  • Designers use non-breaking spaces intentionally
  • Emoji sequences rely on ZWJ

That’s why the Advanced panel lets you disable any category without breaking legitimate content.

FAQ

Should I always enable all categories?

For maximum safety: yes. For multilingual publishing: disable Cf/Zs as needed.

Why not just block everything?

Because legitimate use cases exist. We give you surgical control instead of a sledgehammer.

Precision over panic. Detect what matters — ignore what doesn’t.