4

I have to write German text on a pdf created by Libharu. I assign German Text to a string variable (i.e. std::string TestString = "VariableGesamtlänge";) and then put that text to a pdf. My simple code is following:

        //-----UTF8 Encoding
        HPDF_UseUTFEncodings(pdf);
        HPDF_SetCurrentEncoder(pdf, "UTF-8"); 
        const char *fontname = HPDF_LoadTTFontFromFile(pdf, "FreeSans.ttf", HPDF_TRUE);
        HPDF_Font font = HPDF_GetFont(pdf, fontname, "UTF-8");
        HPDF_Page_SetFontAndSize(page, font, 24);

        std::string TestString = "VariableGesamtlänge";
        DrawText(page, font, TestString.c_str(), y);

Problem: I get two square boxes instead of ä. I am using VS2010.

skm
  • 5,015
  • 8
  • 43
  • 104

1 Answers1

1

'ä' is not an ASCII character. It may be stored as a single character (in which case, which one?), or it may be stored as multiple characters (in which case, which ones?).

You have told the HPDF functions that you are going to pass text around as UTF-8 (which is an entirely sensible choice). This means 'ä' is represented by 0xC3 0xA4.

The source file is almost certainly encoded in 8-bit text, using (probably) code-page 1252. So 'ä' will be the single character 0xE4. You either need to tell the compiler to store strings as UTF-8, or it may be possible to re-encode the source files in UTF-8.

Your final option is to store the text in a (UTF-8) file, and read it from there.

  • In this case, I am not reading the variable from any file. What should I do to tell the ``DrawText()` that I am sending char*, which points to a string that may have characters like `ä` – skm Dec 13 '16 at 16:15
  • `You either need to tell the compiler to store strings as UTF-8,`....how can I do that? In C++11 it is possible, but I am not using it. – skm Dec 13 '16 at 16:16
  • Your suggestion to convert the string to utf-8 string worked by following the code given at `http://stackoverflow.com/questions/23264818/storing-unicode-utf-8-string-in-stdstring`. But how convert a varibale to UTF-8 if I read it from XML file? – skm Dec 13 '16 at 16:24
  • Ah, there life is easier. If the XML file has UTF-8 in it already, you don't need to do any conversion. If it doesn't have UTF-8 in it, then fix it so it does! – Martin Bonner supports Monica Dec 13 '16 at 16:44
  • And that's my main problem actually. I have a XML file in UTF-8 encoding (cross checked in Notepad++). I parse the variables using `pugixml` libraries and store the variables in a vector of strings. But when I put these variables on PDF as mentioned above, strange character appears. I think that the problem is in the parsing itself. – skm Dec 13 '16 at 17:46
  • So you need to check what's coming out of pugixml - is it still UTF-8 encoded? If not, a minimal example showing the problem would be a good question here. – Martin Bonner supports Monica Dec 13 '16 at 18:25