0

I check name validity by this regular expression, allowing any symbol as suggested here:

// Allow any symbol
const QString validNameMatcher = QStringLiteral("^[a-zA-Z0-9 _.,!()+=`,\"@$#%*-]+$");

bool Class::isNameValid(const QString fileName)
{
    QRegularExpression re(validNameMatcher);
    QRegularExpressionMatch match = re.match(fileName);

    if (match.hasMatch())
        return true;
    else
        return false;
}

For a file name like 1111 Rick (wow) L50-57.stl the above function returns true. So far so good.


To allow diacritical marks, I just add [À-ž] to the name-matcher as suggested here:

// [À-ž] is for diacritical marks
const QString validNameMatcher = QStringLiteral("^[a-zA-Z0-9À-ž _.,!()+=`,\"@$#%*-]+$");

After adding [À-ž], surprisingly, for the same file name of 1111 Rick (wow) L50-57.stl, the above function returns false. Am I missing something?


UPDATE

As suggested by @WiktorStribiżew , I used UseUnicodePropertiesOption:

QRegularExpression re(validNameMatcher, QRegularExpression::PatternOption::UseUnicodePropertiesOption);

But it didn't work. The result is the same as before.

Also (*UTF) doesn't work:

const QString validNameMatcher = QStringLiteral("(*UTF)^[a-zA-Z0-9À-ž _.,!()+=`,\"@$#%*-]+$");
Megidd
  • 7,089
  • 6
  • 65
  • 142
  • 1
    What if you compile the regex with `QRegularExpression::UseUnicodePropertiesOption` option? – Wiktor Stribiżew Mar 16 '20 at 08:28
  • @WiktorStribiżew Thanks =) Looks like it doesn't work for me :( – Megidd Mar 16 '20 at 08:41
  • 1
    So, the documentation is misleading saying *This option corresponds to the `/u` modifier in Perl regular expressions.* It only acts as `(*UCP)` and not also `(*UTF)` PCRE verb. Try ``QStringLiteral("(*UTF)^[a-zA-Z0-9À-ž _.,!()+=`,\"@$#%*-]+$")`` – Wiktor Stribiżew Mar 16 '20 at 08:44
  • @WiktorStribiżew Thanks =) Looks like `(*UTF)` doesn't work for me :( – Megidd Mar 16 '20 at 08:53
  • `(*UTF)` is pointless in QRegularExpression: it works on QStrings, so it only does Unicode matching. `(*UCP)` is indeed controlled by that option; and it's equivalent to `/u` (see [perlre](https://perldoc.perl.org/perlre.html)), certainly not to `use feature 'unicode_strings'`... – peppe Jul 07 '20 at 02:01

1 Answers1

0

The key point is @WiktorStribiżew solution of using QRegularExpression::UseUnicodePropertiesOption option:

QRegularExpression re(validNameMatcher, QRegularExpression::PatternOption::UseUnicodePropertiesOption);

But as mentioned on its documentation:

QRegularExpression::UseUnicodePropertiesOption

The meaning of the \w, \d, etc., character classes, as well as the meaning of their counterparts (\W, \D, etc.), is changed from matching ASCII characters only to matching any character with the corresponding Unicode property.

So, it occurred to me to replace [a-zA-Z0-9À-ž_] in my regular expression with just [\w]:

// Bad:
const QString validNameMatcher = QStringLiteral("^[a-zA-Z0-9À-ž _.,!()+=`,\"@$#%*-]+$");

// Good:
const QString validNameMatcher = QStringLiteral("^[\\w .,!()+=`,\"@$#%*-]+$");

Now, isNameValid() function returns expected results.

Megidd
  • 7,089
  • 6
  • 65
  • 142