5

I am using QT Creator, created a console app. Everything is up to date. OS is Windows XP.

I have created a QString that holds some Hungarian chars. Most of the Hungarian chars do not require unicode but the chars that have double slashes for accents require unicode.

I try to write the QString contents to a file but my unicode chars lose their accents in the file. In other words, the unicode info is lost along the way.

My code is bellow.

#include <QtCore/QCoreApplication>
#include <QString>
#include <QTextStream>
#include <QDate>
#include <QFile>
using namespace std;


int main(int argc, char *argv[])
{
    QCoreApplication app(argc, argv);

    QString szqLine = "NON-UnicodeIsOK: áéüúöóí NEED-Unicode: űő";
    //These are Hungarian chars and require unicode. Actually, only the u & o, each having double
    //slashes for eccents require unicode encoding.

    //Open file for writing unicode chars to.
    QFile file("out.txt");
    if ( !file.open(QIODevice::WriteOnly | QIODevice::Text) ){
        return 1;
    }

    //Stream the QString text to the file.
    QTextStream out(&file);
    out.setCodec("UTF-8");

    out << szqLine << endl;     //Use endl for flush.  Does not properly write ű and ő chars.
                                //Accents missing in file.

    file.close();               //Done with file.

    return app.exec();
}
user440297
  • 1,181
  • 4
  • 23
  • 33

2 Answers2

6
  1. What is the encoding of your file? Using non-ascii encodings in source files often causes problems, at least when working cross-platform. I think MSVC has some problems there.

  2. QString foo = "unicode string" uses the implicit conversion from ascii to unicode, which will also cause problems. Always explicitely specify what encoding the literal uses, e.g. by wrapping the literal using QLatin1String() if it's latin1:

    QString foo = QLatin1String("some latin1 string");

or, utf-8, as it should be in your case, QString::fromUtf8():

QString foo = QString::fromUtf8( "funny characters" );
  1. Before writing the string to a file (which is another source for possible errors, although your code looks correct), check if a qt widget (QLineEdit, for example) displays it correctly.

To avoid such errors, I prefer to keep source files pure ascii, with english strings, and then translate them using Qt's internationalization tools.

Edit: Also see the accepted answer to this question about UTF-8 literals in MSVC 2008.

Community
  • 1
  • 1
Frank Osterfeld
  • 24,815
  • 5
  • 58
  • 70
  • It worked. I don't know why UTF-8 or UTF-16 is not default encoding in QT. Any draw backs to me specifying UTF-8 for file encoding in Qt Creator and explicitly specifying encoding for string assignments?. – user440297 Nov 29 '10 at 15:59
  • 1
    @user440297: It's not a Qt, it's a C++ (compiler) issue. Some compilers (MSVC) don't work well with unicode source files. As there is no encoding line in C/C++ (like e.g. in python) nor a default encoding, the compiler would have to guess (sure receipt for pain). See the first answer http://stackoverflow.com/questions/688760/how-to-create-a-utf-8-string-literal-in-visual-c-2008 So to be on the safe side, use ASCII and tr()anslate. – Frank Osterfeld Nov 29 '10 at 18:57
1

Are you sure that szqLine really contains the correct characters? Try this: QString line = QString::fromStdWString(L"NON-UnicodeIsOK: \x00E1\x00E9... NEED-Unicode: \x+0171\x0151";

... and don't use Hungarian notation ;-)

hmuelner
  • 8,093
  • 1
  • 28
  • 39
  • Thanks. I take it the hard coding of the chars is for debugging purposes only? Smart. For dynamic apps that take Unicode strings as input from user,file,xml/databases... I don't think such a technique will cut it. I'm only saying this because I Googled a lot before posting and saw that many advocate hard coding chars like you did, but I only see that as a possibility when the only thing you do with Unicode strings is write out literals. And of course, for debugging. – user440297 Nov 29 '10 at 15:44
  • Your problem was the encoding of literal strings in the program source. For portability reasons the source code should be pure 7 bit ASCII. You can embed characters outside this range with hexadecimal encoding. – hmuelner Nov 30 '10 at 10:42