1

Today I learned about "characters" which are made from more than one code point in UTF8. I always believed that one code point in UTF8 maps to a specific character, but it seems like I was wrong. For example, the following single glyph "é" consists of 3 bytes making up 2 code points.

I am having trouble to render this symbol correctly using SDL_ttf. It seems to use the FT_Get_Char_Index function from the freetype library to find the glyph. It does so by passing the code point, and the library treats it as if it was more than one glyph. How would I use the freetype library to render this glyph correctly?

  TTF_Font *ttf = TTF_OpenFont("C:\\UbuntuMono-Regular.ttf", 24);
  SDL_Color color = {0, 255, 255, 255};
  SDL_Surface *surface = TTF_RenderUTF8_Blended(ttf, "é", color); // u8"é" doesn't work neither

Here's how it looks: enter image description here

Julius
  • 1,155
  • 9
  • 19
  • I'm afraid I don't know the answer to your question, but -- do you _need_ to use combining characters? The e-acute you use in your example is also available as a single code point 233. Some glyphs can only be had by combining, but most common alphabetic symbols are also single code points. – Kevin Boone Sep 15 '20 at 20:09
  • Thanks for the hint, unfortunatelly I need to support it – Julius Sep 15 '20 at 20:21
  • Can you post an image you are getting? – n. m. could be an AI Sep 16 '20 at 04:49
  • @n.'pronouns'm. Sure. I added an image. – Julius Sep 16 '20 at 09:23
  • This is definitely a font problem, it doesn't contain the required combining glyphs. Try other fonts. Read my comments under the answer. – n. m. could be an AI Sep 16 '20 at 09:53
  • @n.'pronouns'm. I actually tried other fonts and couldn't find one. When I was debugging into the SDL_ttf code I figured it would search for one glyph per code point, which I don't think is right. Beside of that, Chrome is able to display the glylph with that font. So I currently assume it's just not supported by the library. – Julius Sep 16 '20 at 11:13
  • Have you tried FreeMono.ttf? It definitely works for me. Also many proportional fonts fork. Monospaced fonts are usually pretty bad. Chrome is able to display the glylph because it actually replaces combined glyphs with precomposed ones when it can, and fallbacks to different fonts if a glyph cannot be found in the specified font. It is pretty complicated. – n. m. could be an AI Sep 16 '20 at 12:05
  • @n.'pronouns'm. I'm sorry you are obviously right. It works with FreeMono.ttf! Do you know how Chrome is able to render it even with the other mono font? – Julius Sep 16 '20 at 13:17
  • Chrome uses a far more sophisticated text renderer, with glyph substitution and fallbacks and whatnot. My guess is that it simply substitutes a precomposed é for rendering because it knows precomposed characters are going to work better. It may also use a different font if your specified font lacks needed glyphs. Freetype doesn't do this out of the box, it renders exactly what you specified. – n. m. could be an AI Sep 16 '20 at 14:29

1 Answers1

1

I wasn't able to reproduce your problem. Could it be a problem in your font?

This is a minimal example I made (just change fontFile to the path to your font):

#include <SDL2/SDL.h>
#include <SDL2/SDL_ttf.h>

int main()
{
    const SDL_Color white = { 0xFF, 0xFF, 0xFF, 0 };

    SDL_Window* window;
    SDL_Renderer* renderer;
    SDL_CreateWindowAndRenderer(200, 200, 0, &window, &renderer);

    TTF_Init();
    const char* fontFile = "/usr/share/fonts/truetype/liberation/LiberationSans-Regular.ttf";
    TTF_Font *font = TTF_OpenFont(fontFile, 32);

    SDL_Surface* surface = TTF_RenderUTF8_Blended(font, "é", white);
    SDL_Texture* texture = SDL_CreateTextureFromSurface(renderer, surface);
    SDL_Rect rect = {10, 10, surface->w, surface->h};

    while(1)
    {
        SDL_Event event;
        SDL_WaitEvent(&event);
        if(event.type == SDL_QUIT)
            break;

        SDL_SetRenderDrawColor(renderer, 0, 0, 0, 0);
        SDL_RenderClear(renderer);

        SDL_RenderCopy(renderer, texture, 0, &rect);
        SDL_RenderPresent(renderer);
    }
}

This is the output I get:

enter image description here

I'm using C.

I always believed that one code point in UTF8 maps to a specific character, but it seems like I was wrong. For example, the following single glyph "é" consists of 3 bytes making up 2 code points.

It's true that a few codepoints can be grouped to make one "character" (the correct name is grapheme)(link). But I don't think it's the case for é (link). Although I might be wrong about that.

tuket
  • 3,232
  • 1
  • 26
  • 41
  • The failure mode certainly depends on the font, but to reproduce the problem you need to use the same characters. You could copy and paste OP's "é", which is two codepoints, or you could type it in C as `u8"e\u0301"`. The font you chose doesn't seem to have accent combining characters, so the accent renders as a replacement character (an empty box). But liberation2 does have them, so if you change the directory to liberation2 (and install `fonts-liberation2` if necessary), you'll see that the accent renders in the wrong place. (Put \u00e9 in the string as well to see the correct placement.) – rici Sep 16 '20 at 04:42
  • It is most certainly the case for all the usual accented characters. – n. m. could be an AI Sep 16 '20 at 04:52
  • @rici This is still a liberation font problem. I tested this code with real combining characters and many fonts. Most render correctly, including droid, freefont, opensans, and noto. Liberation is all ugly and incorrect, though it still sort of works, that is, it places the correct accent somewhere above the letter, however, the combined grapheme doesn't match the precomposed one. – n. m. could be an AI Sep 16 '20 at 05:39
  • @rici addendum: I initially tested with proportional fonts. Unfortunately most monospaced fonts render combined accents incorrectly. One exception I was able to find is FreeMono.ttf (but that's one ugly font). – n. m. could be an AI Sep 16 '20 at 06:00
  • @n.'pronouns'm.: And yet, Chrome and OpenOffice both manage to render acceptably using Liberation Serif. So it's at least partly a problem with Liberation (and I agree with your criticisms of that font), but in general there is a lot going on behind the scenes to get correct rendering, and it's not *just* dependent on the font. (Yes, monospaced fonts are particularly awful; I've wasted an awful lot of time trying to get them to look ok.) – rici Sep 16 '20 at 06:00
  • @rici Chrome and OpenOffice use a more sophisticated text rendering engine (Pango if I'm not mistaken). – n. m. could be an AI Sep 16 '20 at 06:02
  • @n.'pronouns'm.: I'm completely aware of that fact. And even so, they get lots of things wrong. But they do a better job. (I was actually contemplating writing a "use Pango if you want a fighting chance" answer, but I'm not sure that I have the stamina to write it. If you write it, I'll upvote.) – rici Sep 16 '20 at 06:02
  • @rici I tried LibreOffice, it does an ok job with Latin characters, probably because it replaces them with precomposed glyphs for rendering. – n. m. could be an AI Sep 16 '20 at 06:15
  • @n.'pronouns'm. probably. Although it's bloody difficult to know what's actually going on in any rendering problem because there are so many variables. You don't necessarily even know which font the glyphs are coming from, because the rendering engine might be assembling the glyphs from multiple fonts. And if there is a renderer which is prepared to give you a useful log of what it's doing , I haven't found it yet. – rici Sep 16 '20 at 06:23
  • @rici Pango is not for the faint of heart. There was a Pango renderer for SDL1 (0.1 version, not updated since 2013) but nothing for SDL2. This looks like a project to undertake... – n. m. could be an AI Sep 16 '20 at 06:40
  • 2
    Thanks for your answer! As @rici has pointed out this is not the same character as I used. The é must be a two code point character. I thought it would not be a font problem because chrome is able to render the same character with the same font. – Julius Sep 16 '20 at 09:00
  • @Julius Ah, sorry. I copied your character and then I could repro the issue. As @n.'pronouns'm suggested, `FreeMono.ttf` also works for me. It's interesting that some fonts wouldn't have the glyph for `é` (two codepoints) but they do have the glyph for `é` (one codepoint), when they actually look the same. – tuket Sep 16 '20 at 17:49