1

Going through a bunch of code, looking to improve it.

I came across this bit:

if (c == '<' || c == '>') {
    pattern.append("\\b");
} else if (c == 'a') {
    pattern.append("[a-zA-Z]");
} else if (c == 'A') {
    pattern.append("[^a-zA-Z]");
} else if (c == 'h') {
    pattern.append("[A-Za-z_]");
} else if (c == 'H') {
    pattern.append("[^A-Za-z_]");
} else if (c == 'c' || c == 'C') {
    ignorecase = (c == 'c');
} else if (c == 'l') {
    pattern.append("[a-z]");
} else if (c == 'L') {
    pattern.append("[^a-z]");
} else if (c == 'o') {
    pattern.append("[0-7]");
} else if (c == 'O') {
    pattern.append("[^0-7]");
} else if (c == 'u') {
    pattern.append("[A-Z]");
} else if (c == 'U') {
    pattern.append("[^A-Z]");
} else if (c == 'x') {
    pattern.append("[0-9A-Fa-f]");
} else if (c == 'X') {
    pattern.append("[^0-9A-Fa-f]");
} else if (c == '=') {
    pattern.append("?");
} else {
    pattern.append('\\');
    pattern.append(c);
}

If c was a char, this would be easy to turn into a switch. c is a QChar; How should I turn QChar into an interger and reliably compare it to the various cases >, = etc?

ΦXocę 웃 Пepeúpa ツ
  • 47,427
  • 17
  • 69
  • 97
Anon
  • 2,267
  • 3
  • 34
  • 51
  • 1
    Does this answer your question? [How to cast a QChar to int](https://stackoverflow.com/questions/18364482/how-to-cast-a-qchar-to-int) – NutCracker Jan 21 '20 at 09:46
  • @NutCracker Nope, because it lacks context on to reliably comparing it in a switch against cases `'='` or `'⌘'` for example. – Anon Jan 21 '20 at 10:04

2 Answers2

3

A QChar is a wrapper for a 16-bit UTF-16 character.

You can retrieve the value using QChar::unicode() that returns an unsigned short.

You can the write your switch like this:

QChar c;
switch (c.unicode()) {
    case u'a':
    ...
}

Be careful with your case statements as if you use 8-bit char literals, it might not work as expected.

For instance é might be 0xE9 (Latin-1, UTF16), or 0x82 (CP437) or even 0xC3 0xA9 (UTF-8, which will not compile as it needs 2 characters).

The solution is to use UTF-16 literals that are part of C++ since C++11. For exampleu'é' will always be compiled as a char16_t (~unsigned short) of value 0x00E9.

Benjamin T
  • 8,120
  • 20
  • 37
  • `Be careful with your case statements as if you use 8-bit char literals, it might not work as expected.` << that is what I was looking for. Excellent answer. – Anon Jan 21 '20 at 12:12
2

you can define something like a dictionary, and I mean a Map:

int main(int argc, char* argv[])
{
    QMap<QChar, QString> myMap{{'a', "[a-zA-Z]"},{'X', "[^0-9A-Fa-f]"}, {'h', "[A-Za-z_]"}};

    QString regex{};
    regex.append(myMap.value('a', ""));
    regex.append(myMap.value('5', ""));
    regex.append(myMap.value('X', ""));
    qDebug() <<  "myRegex: " << regex;
    return 0;
ΦXocę 웃 Пepeúpa ツ
  • 47,427
  • 17
  • 69
  • 97
  • 1
    Interesting approach; I'm going to benchmark it to see what ends up being faster, a small map or a switch. Still, I would like a switch solution because it applies to a lot of code in the project I'm working in, not just this regex stuff. That other code would need something like function pointers instead. Also, you have to add a bit more complexity than this, given that I do not think a QMap has a `default:` value. – Anon Jan 21 '20 at 10:19