Feeding the rule "NFD; [:Nonspacing Mark:] Remove; NFC"
into the ICU Transliterator demo, the character Ø
(\u00d8
== LATIN CAPITAL LETTER O WITH STROKE
) remains as-is (i.e. the STROKE is not stripped).
In the list of non-marking spaces (Category Mn
), I cannot find anything named COMBINING DIAGONAL STROKE
akin to the COMBINING SHORT STROKE OVERLAY
(\u0335
) or COMBINING LONG STROKE OVERLAY
(\u0336
).
However, I do find COMBINING SHORT SOLIDUS OVERLAY
(\u0337
) and COMBINING LONG SOLIDUS OVERLAY
(\u0338
). They appear similar, but render as much thicker lines in my browser when combined with o
and O
.
The Unicode data I accessed for \u00d8
does not provide a decomposition for that character.
At the same time, the ICU Collator Demo will collate each of ø
, o
, Ø
, O
, o\u0337
and O\u0338
to the same code point using a Primary (Level = 1 = Base Letter) Collator.
Does this mean that the locale of Collator used in the Demo has been set up to identify the base character in a way where the Unicode spec is silent?
If so, do I need to a custom Rule Based Transliterator if I want to strip the STROKE from LATIN [CAPITAL, SMALL] LETTER *
characters on transliteration?