0

I'm trying to print a sequence of bytes with the function below, however I'm experiencing something strange, the character 0xED for example, which should be a latin small letter i with acute, is printed as � (a strange character with a question mark inside, as though it can't be printed). Is it due to my code, or the console I print it in ?

Also, is the code correct, or what would you have done differently to improve it ?

Edit: output example

1F 8B 08 00 00 00 00 00  97 86 22 0D 89 72 EC 04    ........ .."..r�.

Thanks

void printBytes(std::string string) {
    QDebug qD = qDebug().nospace();
    qD << "Printing string of size " << string.size() << "\n";
    char buffer [3];
    int j = 0;
    std::string printable = "";
    for (uint32_t i = 0; i != string.size(); ++i) {
        snprintf(buffer, sizeof(buffer), "%02X", (unsigned char) string.at(i));
        qD << buffer;
        printable += QChar(string.at(i)).isPrint() ? string.at(i) : '.';
        printable += j == 7 ? " "  : "";
        if (j == 15) {
            qD << "\t" << printable.c_str() << "\n";
            printable = "";
            j = 0;
        } else {
            qD << (j == 7 ? "  " : " ");
            j++;
        }
    }
    if (j != 0) {
        qD << std::string((16-j) * 3, ' ').c_str() << "\t" << printable.c_str();
    }
}
jido51
  • 1
  • 2
  • 2
    character set mismatch. you're trying to print a character from charset X in a display environment using charset Y, and that "y" charset doesn't have that particular character glyph, or it's invalid characte for that charset. – Marc B Jul 12 '16 at 21:26
  • Maybe you should consider printing numeric value instead of character? – nosbor Jul 12 '16 at 21:29
  • related: http://stackoverflow.com/a/217269/12711 – Michael Burr Jul 12 '16 at 22:01
  • Qt assumes UTF-8 for byte-wide representation. You assume Latin-1. There's your problem. If you know that your input is Latin-1, you must transcode to UTF-8 first before you display it. – Kuba hasn't forgotten Monica Jul 12 '16 at 22:03

1 Answers1

2

Whether or not a particular octet (a byte) displays a particular character depends entirely upon your system environment's locale.

Octet 0xED is indeed the character í in the ISO-8859-1 (or ISO-8859-15, perhaps) locale. But if, for example, your system environment's locale is UTF-8 (as is normally the case with all modern operating systems), the character í gets displayed by the multi-byte sequence 0xC3 0xAD.

The most likely answer is that your operating system (which you did not specify) does not use a ISO-8859-n locale, but is probably UTF-8 (or some other encoding). Either reconfigure your system environment, or use UTF-8 (or the correct encoding) for your program's output.

Sam Varshavchik
  • 114,536
  • 5
  • 94
  • 148
  • Yes my console is using UTF8, and I think it would make more sense that my own code would display the characters as UTF8. How would I do that ? – jido51 Jul 12 '16 at 21:34
  • Well, by writing the `UTF-8` octet sequences, that correspond to the characters you want to show, to `std::cout`. Same exact process as for displaying `ISO-8859-1`-encoded characters. – Sam Varshavchik Jul 12 '16 at 21:40
  • I'm sorry, I don't follow. I just want to print characters that are printable in the right part of the line, like wireshark does for example, but, even though isPrint() returns true, it isn't printable. So what should I do to make it printable, or what am I doing wrong ? – jido51 Jul 12 '16 at 21:44
  • I've added a output example in my OP – jido51 Jul 12 '16 at 21:47
  • 1
    I'm not sure what you're asking. In order to print the i-grave character, your code simply needs to emit 0xC3 0xAD, instead of 0xED. Seems fairly clear. If you're asking how to convert from ISO-8859-n to UTF-8, use the iconv library. Google it. – Sam Varshavchik Jul 12 '16 at 22:02
  • @jido51: if your console is set for UTF8 then you will need to convert the ISO-8859-1 encoding to UTF8 for the character to display. The docs for the `QTextCodec` class might help: http://doc.qt.io/qt-5/qtextcodec.html – Michael Burr Jul 12 '16 at 22:05