2

In Cocoa, regular expressions are presumably following the ICU Unicode rules for character matching and the ICU standard includes character properties such as \p{L} for matching all kinds of Unicode letters. However

NSString* str = @"A";
NSPredicate* pred = [NSPredicate predicateWithFormat:@"SELF MATCHES '\\p{L}'"];
NSLog(@"%d", [pred evaluateWithObject:str]);

doesn't seem to compile:

Can't do regex matching, reason: Can't open pattern U_REGEX_BAD_INTERVAL (string A, pattern p{L}, case 0, canon 0)

If character properties are not supported (are they?), how else could I check if a string contains a Unicode letter in my iOS app?

Desmond Hume
  • 8,037
  • 14
  • 65
  • 112
  • You need to double the backslash - `\\p{L}`, though `MATCHES` requires a full string match. Try `.*\\p{L}.*` or even `(?s).*\\p{L}.*` – Wiktor Stribiżew May 23 '16 at 08:36
  • @WiktorStribiżew Please take a look at the update. – Desmond Hume May 23 '16 at 08:42
  • Well, I think it works like this: `NSString * rx = @"(?s).*\\p{L}.*"; NSPredicate * predicat = [NSPredicate predicateWithFormat:@"SELF MATCHES %@", rx];`. Also, just tested `NSPredicate * predicat = [NSPredicate predicateWithFormat:@"SELF MATCHES '(?s).*\\p{L}.*'"];` - [it works](https://ideone.com/MBmFea) – Wiktor Stribiżew May 23 '16 at 08:43
  • Also, the code you show [works OK](https://ideone.com/Y5NvO4), just `MATCHES` requires a full string match, and thus, the `h` letter was not enough to return true. – Wiktor Stribiżew May 23 '16 at 08:49
  • @WiktorStribiżew Or `"SELF MATCHES '(?s).*\\\\p{L}.*'"`. Thanks a bunch! – Desmond Hume May 23 '16 at 08:52
  • Duplicate of [this](http://stackoverflow.com/questions/1706633/detect-unicode-characters-in-nsstring-on-iphone). – Aown Raza May 23 '16 at 09:08
  • @AownRaza: I think it is not a dupe of *that* post because here, *any* letter must be detected, regardless of it being from the ASCII range or non-ASCII one. – Wiktor Stribiżew May 23 '16 at 09:22

1 Answers1

2

The main point here is that MATCHES requires a full string match, and also, \ backslash passed to the regex engine should be a literal backslash.

The regex can thus be

(?s).*\p{L}.*

Which means:

  • (?s) - enable DOTALL mode
  • .* - match 0 or more any characters
  • \p{L} - match a Unicode letter
  • .* - match zero or more characters.

In iOS, just double the backslashes:

NSPredicate * predicat = [NSPredicate predicateWithFormat:@"SELF MATCHES '(?s).*\\p{L}.*'"];

See IDEONE demo

If the backslashes inside the NSPrediciate are treated specifically, use:

NSPredicate * predicat = [NSPredicate predicateWithFormat:@"SELF MATCHES '(?s).*\\\\p{L}.*'"];
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563