4

I'm trying to parse out "@mentions" from a user provided string. The regular expression itself seems to find them, but the range it provides is incorrect when emoji are present.

let text = " @joe "
let tagExpr = try? NSRegularExpression(pattern: "@\\S+")
tagExpr?.enumerateMatches(in: text, range: NSRange(location: 0, length: text.characters.count)) { tag, flags, pointer in
    guard let tag = tag?.range else { return }

    if let newRange = Range(tag, in: text) {
        let replaced = text.replacingCharacters(in: newRange, with: "[email]")
        print(replaced)
    }
}

When running this tag = (location: 7, length: 2)

And prints out [email]oe

The expected result is [email]

InkGolem
  • 2,662
  • 1
  • 15
  • 22

1 Answers1

6

NSRegularExpression (and anything involving NSRange) operates on UTF16 counts / indexes. For that matter, NSString.count is the UTF16 count as well.

But in your code, you're telling NSRegularExpression to use a length of text.characters.count. This is the number of composed characters, not the UTF16 count. Your string " @joe " has 9 composed characters, but 12 UTF16 code units. So you're actually telling NSRegularExpression to only look at the first 9 UTF16 code units, which means it's ignoring the trailing "oe ".

The fix is to pass length: text.utf16.count.

let text = " @joe "
let tagExpr = try? NSRegularExpression(pattern: "@\\S+")
tagExpr?.enumerateMatches(in: text, range: NSRange(location: 0, length: text.utf16.count)) { tag, flags, pointer in
    guard let tag = tag?.range else { return }

    if let newRange = Range(tag, in: text) {
        let replaced = text.replacingCharacters(in: newRange, with: "[email]")
        print(replaced)
    }
}
Lily Ballard
  • 182,031
  • 33
  • 381
  • 347
  • 1
    That is what was answered at https://stackoverflow.com/questions/46293204/swift-regex-doesnt-work , and therefore I closed as a duplicate of *that one*. Not sure why that should be a "bad dupe" and reopened. – Martin R Sep 29 '17 at 19:18
  • @MartinR Did the dupe change? I clicked through to the duplicate and it was https://stackoverflow.com/questions/39701316/use-regex-to-match-emojis-as-well-as-text-in-string/39701370#39701370, which was an answer recommending going through `NSString`. – Lily Ballard Sep 29 '17 at 19:19
  • Hmm, that link I provided is actually in the comments on the post. Did I simply misclick? Argh. – Lily Ballard Sep 29 '17 at 19:20
  • That was a comment from Lou Franco, but not what I chose as a duplicate. – Martin R Sep 29 '17 at 19:20
  • 1
    I've re-duped it. Sorry about that. – Lily Ballard Sep 29 '17 at 19:21
  • This doesn't work for me when I try to retrieve the matched substring. If an emoji precedes it `text[range]` will be one character offset. – Rivera Nov 18 '19 at 21:19
  • it doesn't work either. Do we know a good solution? thanks – Pablo Martinez Jun 09 '21 at 15:51