2

I have a primitive test: the project in msvc2015 and Qt5.9.3. The file main.cpp is saved in Unicode as UTF-8 with signature:

enter image description here

I try to show the message box which should show some text on Russian language. The whole code:

#include <QtWidgets/QApplication>
#include <QMessageBox>

int main(int argc, char *argv[])
{
    QApplication a(argc, argv);

    QString ttl = QString::fromUtf8("russian_word_1");
    QString txt = QString::fromUtf8("russian_word_2");

    QMessageBox::information(nullptr, ttl, txt);

    return a.exec();
}

And what I receive:

enter image description here

How may this be possible?


Update 1: I want to use UTF-8 exactly with BOM according to the Stackoverflow author's statement:

...It does not make sense to have a string without knowing what encoding it uses ���


Update 2: In this particular case, most likely it is a bug in the compiler.
Vladimir Bershov
  • 2,701
  • 2
  • 21
  • 51
  • Btw, be sure that you are compiling you application as unicode and/or your OS is configured to use Russian as default language for non-unicode programs. – Dmitry Sazonov Feb 09 '18 at 12:44
  • I use VS2012, Qt 5.6, code saved as 'UTF-8 without BOM' and I can display german umlauts from code in QMessageBoxes: `QMessageBox::information(this, tr("Süß!"), tr("{ÄÊì&78╬aúeKã´YﭛᵴҺϸ̚Ƚ"));` - works out of the box. – Martin Hennings Feb 09 '18 at 14:51
  • @MartinHennings yes, but I talk about UTF-8 WITH Bom – Vladimir Bershov Feb 09 '18 at 14:55
  • Should make no difference. Qt5 interprets all string literals as UTF-8 (except if explicitly told not to do so), and I trust VisualStudio to encode the same, regardless of BOM. The real reason must lie somewhere else. – Martin Hennings Feb 09 '18 at 15:04
  • @MartinHennings , I have always worked `WITHOUT BOM`, and there were never problems. But today I decide to try `WITH` :) – Vladimir Bershov Feb 09 '18 at 15:11
  • It's not a bug. There is nothing in CPP standard about supporting non-ASCII (>127) characters in source code. – Dmitry Sazonov Feb 13 '18 at 19:08

4 Answers4

2

Don't use non-english ASCII in your code. Because compilation depends on compiler, source file encoding etc. Write only english text, wrapped in tr(""). Create translation files, load them. Read about internalization in qt.

Another usefull link.

Dmitry Sazonov
  • 8,801
  • 1
  • 35
  • 61
  • Are you sure that this is the best solution for only-one language program? – Vladimir Bershov Feb 09 '18 at 13:30
  • @VladimirBershov yes, I think that is the only way. If you are using IDE-s like Visual Studio or QtCreator - then it should be easy to use. – Dmitry Sazonov Feb 09 '18 at 13:55
  • @VladimirBershov It is good advice to use plain english strings and tr() them into your native language, but /in practice/ this should work, too. So no, this is not neccessarily the best solution for only-one language program - for multiple reasons (you need to lupdate / lrelease / distribute, harder to find strings seen in production, harder to convey exact meaning in non-native language etc.). – Martin Hennings Feb 09 '18 at 15:08
  • @DmitrySazonov , thanks, but I am sure that *Unicode* is the very thing for decide the problem once and forever. Every compiler should provide full support of Unicode (*especially* using BOM). Use only ASCII in code and use translation files are very good advices, but not for this reason. – Vladimir Bershov Feb 09 '18 at 15:21
  • @VladimirBershov you need to look for an answer, how to force MSVC to correctly interpret your source files. It is not related to Qt then. – Dmitry Sazonov Feb 13 '18 at 17:53
  • @VladimirBershov, see this post anyway [Non English in source code](https://stackoverflow.com/questions/16436278/arabic-in-qt-with-qstring) – Mohammad Kanan Feb 13 '18 at 18:42
  • @VladimirBershov, I just wrote a different answer of the same problem, may be not your specific case, but generally without the cost of _internationalization_ these are the options we have for _Arabic_ text in source files. – Mohammad Kanan Feb 13 '18 at 18:44
1

If the compiler produces garbage strings for UTF-8 source files that have a BOM, then it's a bug in the compiler. However, the use of a BOM with UTF-8 is not recommended in the first place. You shouldn't use it unless you actually have a reason to.

Furthermore, you don't need to do explicit fromUtf8() conversions. You can just do:

QString ttl = "russian_word_1";
QString txt = "russian_word_2";

QString assumes string literals are UTF-8. From the documentation:

In all of the QString functions that take const char * parameters, the const char * is interpreted as a classic C-style '\0'-terminated string encoded in UTF-8.

You may use QStringLiteral to wrap string literals as an optimization, but this is not required.

Lastly, you can use tr() to wrap the string literals if you at some point want to translate the application from Russian to other languages. It is generally a good idea to use tr() in case you later decide to do a translation.

Note that having non-English strings in source code is generally fine. It's what UTF-8 (and Unicode in general) is there for. All modern compilers support it. What most people frown upon however, is non-English code:

auto индекс = 0; // Please don't.

But non-English strings are fine.

Nikos C.
  • 50,738
  • 9
  • 71
  • 96
  • But what about this famous statement of the Stackoverflow author: The Single Most Important Fact About Encodings.. [**It does not make sense to have a string without knowing what encoding it uses**](https://www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses/) – Vladimir Bershov Feb 11 '18 at 22:15
  • @VladimirBershov In our case, we know what encoding it uses. It's UTF-8. – Nikos C. Feb 11 '18 at 22:45
  • yes, but I'm talking about global correctness, and I want to use my project in another IDE too – Vladimir Bershov Feb 11 '18 at 22:49
  • @VladimirBershov Are there any modern IDEs that don't support UTF-8? – Nikos C. Feb 11 '18 at 22:53
  • As we see, Microsoft Visual Studio 2015 at least – Vladimir Bershov Feb 11 '18 at 22:55
  • Now I am sure that it is absolutely wrong to save some text without an information of encoding – Vladimir Bershov Feb 11 '18 at 23:07
  • @VladimirBershov What do you mean with "information of encoding?" – Nikos C. Feb 11 '18 at 23:33
  • @VladimirBershov Also, previously, you said " I have always worked WITHOUT BOM, and there were never problems. But today I decide to try WITH". So what is your problem exactly? You said it works fine without a BOM. And using a BOM with UTF-8 is NOT recommended to begin with. – Nikos C. Feb 11 '18 at 23:34
  • Information on which every text editor can find out which encoding is used. Let's say I want to save text files in such a universal way to forget about encoding problems in general. – Vladimir Bershov Feb 11 '18 at 23:40
  • There were never problems while using msvc + Qt – Vladimir Bershov Feb 11 '18 at 23:42
  • 1
    @VladimirBershov A BOM is a **B**yte **O**rder **M**ark. It specifies byte order. Byte order is irrelevant for UTF-8 and using a BOM causes issues. Just like you don't need a BOM for ASCII files, you don't need one for UTF-8 either. Just use UTF-8 without a BOM and tell your IDE that your files are in UTF-8 instead of ASCII. That's the simplest solution. – Nikos C. Feb 11 '18 at 23:43
  • There is special only for UTF-8 byte order mark, therefore it is a encoding flag too. The issues is caused by wrong applications, not by BOM – Vladimir Bershov Feb 11 '18 at 23:46
  • 1
    @VladimirBershov Well, you're free to ignore advice. I'm just telling you what the recommended approach is. – Nikos C. Feb 11 '18 at 23:57
1

If your using Qt Creator + MSVC compiler, this may help you.

TLDR:

  1. save all of your source files as UTF-8 without BOM
  2. add this line in your .pro file: QMAKE_CXXFLAGS += /utf-8

Done!

refs:

  1. MSVC compiler flag to Set Source and Executable character sets to UTF-8
  2. Add compiler flag in Qt Creator
cangyin
  • 43
  • 7
0

Use QByteArray for your message text, then get it as unicode QString for display:

 int main(int argc, char *argv[])
{
    QApplication a(argc, argv);
    QTextCodec *codec1 = QTextCodec::codecForName("CP1256");
    // Converted Text:
    QByteArray myLanguage = "لا لا لا لا لا لا لا ";
    QString myLanguage2unicode = codec1->toUnicode(myLanguage);
    // Non converted text:
    QString txt = QString::fromUtf8("لا لا لا لا لا لا لا  ");

      QMessageBox::information(nullptr, myLanguage2unicode, txt);

    return a.exec();
}

Result of above code:

enter image description here

Mohammad Kanan
  • 4,452
  • 10
  • 23
  • 47
  • It depends on encoding of source files and compiler. Text codecs are not for that. – Dmitry Sazonov Feb 13 '18 at 17:54
  • @DmitrySazonov Usually, for simple, only-one language program, if find it costly to go for _internalization_ .. then, while writing _Non English_ text inside the source code, text `codecs` are a good solution .. I do use them. – Mohammad Kanan Feb 13 '18 at 18:32
  • @DmitrySazonov, Just in case we are not on the same page, see my code And snap shot of output, As I updated in my answer, the point here is its valid while it of course is not best choice for portable ...etc code. – Mohammad Kanan Feb 13 '18 at 18:34
  • @DmitrySazonov, check this [Post](https://stackoverflow.com/questions/16436278/arabic-in-qt-with-qstring) – Mohammad Kanan Feb 13 '18 at 18:40
  • I believe that it is a very bad practice to rely on local PC encoding, because it may be changed at any moment. At least, source files should be in utf8/16/32. – Dmitry Sazonov Feb 13 '18 at 19:07
  • @DmitrySazonov It is. but once your code is compiled, alas its utf8 string regardless of what you change _after_ that on local PC. – Mohammad Kanan Feb 13 '18 at 21:18
  • @DmitrySazonov This works for short to live, small applications .. while I spent a couple of weeks internationalizing a VS project to 4 languages .. hundreds of strings! – Mohammad Kanan Feb 13 '18 at 21:20
  • It is very cool, if you can do a correct translation on a 4 languages. But I'm sure that the time that you spent is nothing in compare with time, that you spent on coding. – Dmitry Sazonov Feb 14 '18 at 09:30
  • @DmitrySazonov, No! that was real internationalization with resource strings. – Mohammad Kanan Feb 14 '18 at 09:31