8

I just stumbled over this question about coloring diacritics. The task was to color diacritics in another color than the base text, like in á presenting a in blue and ´ in red. I thought I could give it a try, separating letter and diacritic through unicode combining marks, and applying another color to the diacritics by putting a span around it, like this:

<p>
p<span>̄ </span>
o<span>̄ </span>
m<span>̃ </span>
o<span>̃ </span>
d<span>̈ </span>
o<span>̈ </span>
r<span>̌ </span>
o<span>̌ </span>
</p>

Now, having defined a simple CSS like this,

p { color:blue; }
span { color:red; }

I get the following, quite unforeseen but beautiful result:

enter image description here

What is happening here? I naively guessed that the font rendering algorithm prefers pre-rendered characters like ōõöřǒ, as long as they exist in the font, over dynamically combined ones like p̄m̃d̈, rendering it as one or two separate items retrospectively, which then triggers the diacritics coloring only in the second case. (Please tell me frankly when this interpretation is complete nonsense.) Further, this would mean that the approach for coloring diacritics surprisingly actually works under non-standard circumstances. Can anyone explain this behaviour? And would there be a way to enforce this for the other (completely blue) letters too? It is a kind of "fun" question not yet linked to an application right now, but it might be an interesting case to learn from.

I put up a fiddle so you can play around with it.

friedemann_bach
  • 1,418
  • 14
  • 29
  • Everything is explained at https://bytes.com/topic/html-css/answers/735480-combining-diacritical-marks-html-css – Gabriele Petrioli Dec 23 '17 at 23:19
  • Also see https://stackoverflow.com/questions/23537441/how-to-display-accents-over-words-with-different-colors-in-html-css for possible workarounds (*not very good ones*). – Gabriele Petrioli Dec 23 '17 at 23:32
  • 1
    One thing you could do is to insert the Combining Grapheme Joiner (U+034F) between the base letter and the accent. This way the font renderer won’t try to substitute the precomposed glyph and instead apply the colours to each character separately. – CharlotteBuff Dec 23 '17 at 23:58
  • 1
    For people having trouble reading through the opinionated and highly repetitive comment thread on the linked page, the TL;DR is that combined characters are canonically equivalent to their base+diacritic counterparts. Or, an ř U+0159 _IS_ an r with a ̌ U+0072 U+030C - they are the same. And programs are allowed, even encouraged, to print an ř when they encounter an r with a ̌. – Mr Lister Dec 24 '17 at 09:14

2 Answers2

2

One valid solution, as proposed by RandomGuy32, is

to insert the Combining Grapheme Joiner (U+034F) between the base letter and the accent. This way the font renderer won’t try to substitute the precomposed glyph and instead apply the colours to each character separately.

I tried this in the fiddle (version 2 of the one mentioned in my question). I put a U+034F directly after each base letter, and indeed this is working as RandomGuy32 explained. You do not see in the codeblock here, so I inserted a comment to indicate the position of U+034F:

o͏<!--U+034F--><span>̄

However, this would require a renderer on client or server side to process each letter with a diacritic, then separate it and insert both the span and U+034F. It might be a solution when you do not want to double your text (as proposed in a CSS based solutions on the page mentioned above).

friedemann_bach
  • 1,418
  • 14
  • 29
1

As of 2023-02-18 04:00 ACT, this is now possible.

What was happening is that, when a precomposed character is available in Unicode, a Unicode-compliant software will replace a decomposed (“character + diacritic”) glyph.

In your example screenshot, the characters without a colour are the ones replaced with a precomposed character available in Unicode. To prevent this, you have to add either font-family or font-weight (those that I tested).

Since currently, and based on my tests, DejaVu Sans has the most accurate combining diacritic positioning available, font-family: "DejaVu Sans" is a good option. Of course, to ensure it displays, you also have to add @font-face.

Here's a repo you can clone for testing: test-repo branch 'noto-diacriticals'.

span.diacritical-mark {
  color: hsla(0deg, 100%, 50%, 1);
  color: hwb(0deg 0% 0% / 100%);
  font-family: "DejaVu Sans"; /* font with the most accurate positioning */
}
<ul>
  <li>bata<span class="diacritical-mark">̀</span></li>
  <li>panibugho<span class="diacritical-mark">̂</span></li>
  <li>ara<span class="diacritical-mark">́</span>w–a<span class="diacritical-mark">́</span>raw</li>
  <li>ke<span class="diacritical-mark">̈</span>tke<span class="diacritical-mark">̈</span>t</li>
  <li>sag<span class="diacritical-mark">̃</span>nay; sagn<span class="diacritical-mark">̃</span>ay</li>
  <li>san<span class="diacritical-mark">͠</span>ga</li>
  <li>a<span class="diacritical-mark">̄</span>so</li>
  <li>h<span class="diacritical-mark">͞</span>oy</li>
  <li>n<span class="diacritical-mark">͞</span>ay</li>
  <li>traba<span class="diacritical-mark">̱</span>ho</li>
  <li>trab<span class="diacritical-mark">͟</span>aho</li>
  <li>trap<span class="diacritical-mark">͟</span>o</li>
</ul>

See the above code via CodePen.