14

I want to validate QLineEdit's text with a regular expression. It should allow characters from a to z plus A to Z plus Turkish characters(ğüşöçİĞÜŞÖÇ) plus numbers from 0 to 9. I googled about my problem and found two solutions but neither one worked for me. In one solution it says "include Turkish characters in regexp" and in other one it says "use unicodes of turkish characters"

Below are two reqular expressions

QRegExp exp = QRegExp("^[a-zA-Z0-9ğüşöçİĞÜŞÖÇ]+$");

QRegExp exp = QRegExp("^[a-zA-Z0-9\u00E7\u011F\u0131\u015F\u00F6\u00FC\u00C7\u011E\u0130\u015E\u00D6\u00DC]+$");

Neither one of reqular expressions above can validate the name 'İSMAİL'. Also I tried a text only contains Turkish characters('ğüşöçİĞÜŞÖÇ') but it can not be validated. When I remove 'İ' character from both texts they can be validated. I guess the problem may be related with 'İ' character.

How can I solve the problem?

Note: We are using Qt 4.6.3 in our project.

onurozcelik
  • 1,214
  • 3
  • 21
  • 44
  • 2
    Looking at your original suggestion, `^[a-zA-Z0-9ğüşöçİĞÜŞÖÇ]+$` works fine for me in all regex matchers I tried (e.g.: http://www.regex101.com/r/gR2xB2). Are you sure the problem isn't elsewhere? – mart1n Jun 05 '13 at 08:20
  • in your regex "ı" is missing make sure to add that – numan Aug 09 '21 at 12:45

4 Answers4

23

I think this is an encoding problem. You use implicit cast from const char* to QString which results in using QString::fromAscii. If you want to use non-Latin1 encoding here, you need to call QTextCodec::setCodecForCStrings and set the encoding your source files are saved in. I'd use UTF-8 encoding, so at the initialization of the app should be done like this:

QTextCodec::setCodecForCStrings(QTextCodec::codecForName("utf-8"));
QRegExp exp = QRegExp("^[a-zA-Z0-9ğüşöçİĞÜŞÖÇ]+$");
qDebug() << exp.exactMatch("İSMAİL"); // <= true

I suggest more clear solution to check if your problem is here. Save your code in UTF-8 encoding and use QString::fromUtf8 to convert your string literals to QString using UTF-8 explicitly:

QRegExp exp = QRegExp(QString::fromUtf8("^[a-zA-Z0-9ğüşöçİĞÜŞÖÇ]+$"));
qDebug() << exp.exactMatch(QString::fromUtf8("İSMAİL")); // <= true
Pavel Strakhov
  • 39,123
  • 5
  • 88
  • 127
  • When I try your first solution like this it returns true: `QTextCodec::setCodecForCStrings(QTextCodec::codecForName("utf-8")); exp.exactMatch("İSMAİL"); ... ` But when I try your first solution like this it returns false: `QTextCodec::setCodecForCStrings(QTextCodec::codecForName("utf-8")); QString name = ui.txtName->text(); //text() is returning "İSMAİL" of course exp.exactMatch(name);` Why this happens? – onurozcelik Jun 05 '13 at 13:10
  • I can't reproduce this. Are you sure that everything is identical except for `"İSMAİL"` replaced to `ui.txtName->text()`? – Pavel Strakhov Jun 05 '13 at 14:12
  • Here is the link of my test code:[Test code](http://db.tt/gwsWbbbD). I tested it with words "İSMAİL","şule","ışık" and none of them passes. Tell me where I am doing wrong. – onurozcelik Jun 05 '13 at 18:36
  • 1
    Your `window.cpp` file are saved not in the UTF-8 encoding. I suppose it's saved in Windows-1254 encoding. Save your files in UTF-8 encoding to make this work. Also it just appears to me that Visual Studio compiler has some issues with UTF-8 files saved with BOM (see [this answer](http://stackoverflow.com/a/15897742/344347)). So you should use UTF-8 **without BOM**. The other option is to keep your files in Windows-1254 (or whatever encoding you choose) and set this encoding in `setCodecForCStrings`. – Pavel Strakhov Jun 05 '13 at 20:02
2

You could try matching ^\p{L}+$, which is the shorthand for any letter.

mart1n
  • 5,969
  • 5
  • 46
  • 83
1

most probably you need \w. It includes any character (any language) digits and underscore character.
You can exclude underscore like that (?!_)\w.

Marek R
  • 32,568
  • 6
  • 55
  • 140
0

You might want to try this?

QRegExp exp = QRegExp("[^ -~^ı^Ü^ü^Ö^ö^Ç^ç^Ş^ş]");

It basically ignores in the text any character from space to tilde (almost all printable characters)

Plus, it also ignores in the text the other characters which we find in the Turkish alphabet. (ö,ç,ü,ı).

So, whatever this expression captures are non-printable characters in a Turkish text; you can replace them with, say, "?" in the text.

I hope it helps!

Good luck.

Aykut Saribiyik
  • 775
  • 1
  • 6
  • 15