3

I'm confused on how to use the NSRegularExpression class in Swift, especially the :length parameter of NSRange.

Some tutorials say that NSRegularExpression should only be applied to NSString instances, while others say it's OK to apply it to (Swift) string instances as long as you provide utf8.count or utf16.count to :length parameter of NSRange:

var str : String = "#tweak #wow #gaming" 
if let regex = try? NSRegularExpression(pattern: "#[a-z0-9]+", options: .caseInsensitive) {
    regex.matches(in: str, options: [], range: NSRange(location: 0, length: str.utf8.count)).map {
        print(str.substring(with: $0.range))
    }
}

The following are quotes from this source:

Due to the way strings are handled differently in Swift and Objective-C, you will need to provide the NSRange instance with a string length from NSString, and not from String.

This is, roughly speaking, because NSString uses fixed-width encoding and String uses variable-width encoding.

Furthermore, is the following documentation really the best Apple can do with respect to documenting the NSRegularExpression class in Swift?

https://developer.apple.com/documentation/foundation/nsregularexpression

I'd at least expect a list of properties and methods of the class, but it only show some examples. Is there any more elaborate documentation?

rmaddy
  • 314,917
  • 42
  • 532
  • 579
Shuzheng
  • 11,288
  • 20
  • 88
  • 186
  • Related: [Swift extract regex matches](https://stackoverflow.com/questions/27880650/swift-extract-regex-matches). – Martin R Aug 16 '19 at 13:55

1 Answers1

3

The utf16 count is correct, not the utf8 count. Or, best, use the convenience initializers, which convert a Range of String.Index to a NSRange:

let range = NSRange(str.startIndex..., in: str)

And to convert NSRange to String.Range:

let range = Range(nsRange, in: str)

Thus, putting that together:

let str = "#tweak #wow #gaming" 
if let regex = try? NSRegularExpression(pattern: "#[a-z0-9]+", options: .caseInsensitive) {
    let nsRange = NSRange(str.startIndex..., in: str)
    let strings = regex.matches(in: str, range: nsRange).compactMap {
        Range($0.range, in: str).map { str[$0] }
    }
    print(strings)
}

See WWDC 2017 Efficient Interactions with Frameworks, which talks about (a) our historical use of UTF16 when dealing with ranges; and (b) the fact that we don’t have to do that any more.

Rob
  • 415,655
  • 72
  • 787
  • 1,044
  • I'm new to iOS dev. Will you make your answer more elaborate, and also comment whether a better documentation exist? Also, is Apple using UTF16 instead of UTF8, which is the standard? – Shuzheng Aug 16 '19 at 13:49
  • Both UTF16 and UTF8 are standards, but UTF8 is more common (e.g. for web services and the like). But it doesn’t matter which is the “better” standard, the question is which is the correct way to convert `NSRange` to `String.Range` and back, and for that, you must use UTF16, regardless. Or, as I’ve said, to excise your code of cryptic UTF choices, use these convenience initializers, which eliminate the guesswork. I’ll dig around and see if I can find references for you. – Rob Aug 16 '19 at 14:05
  • @Shuzheng - FYI [the UTF16View documentation](https://developer.apple.com/documentation/swift/string/utf16view) describes how UTF16 is used for indexing within `NSString` (and `NSRange` is basically a range of indexes within a `NSString`). – Rob Aug 16 '19 at 14:23
  • Thanks you, I can't find any official references for NSRegularExpression in Swift. I guess, Apple assumes that we are all coming from Objective-C. Last question, are Swift strings encoded as UTF16, while Objective-C strings are encoded as ASCII? – Shuzheng Aug 16 '19 at 14:23
  • @Shuzheng: From [NSString](https://developer.apple.com/documentation/foundation/nsstring) (which is originally an Objective-C class and predates Swift): *“A string object presents itself as a sequence of UTF–16 code units.”* – Swift strings used UTF-16 (or ASCII) originally, but switched to UTF-8 in Swift 5. That should be considered an implementation detail. – Martin R Aug 16 '19 at 14:24
  • @MartinR - I've also read something like that. But why is UTF16 then provided to NSRange length:, when we use UTF8? – Shuzheng Aug 16 '19 at 14:26
  • @Shuzheng: NSRange describes a range of characters in an NSString, and therefore counts UTF-16 code units. – You could of course ask: “Why does NSExpression take a (Swift) String argument for the string, but a (Foundation) NSRange for the range?” That is a good question. String/NSString are bridged/converted more or less automatically, but the corresponding ranges are not. – Martin R Aug 16 '19 at 14:32
  • @MartinR - I guess UTF8 must be compatible with UTF16, like ASCII is compatible with UTF8. Otherwise, I cannot see how it makes sense to provide an UTF16 range (NSRange) to NSRegularExpression.matches(), which may operate on an UTF8 encoded Swift string? – Shuzheng Aug 16 '19 at 14:48
  • @Shuzheng: No. `NSRegularExpression` is an Objective-C Foundation class and works with NSString and NSRange. The Swift string is automatically converted to NSString when passed to an Objective-C method. – Martin R Aug 16 '19 at 14:51
  • @Rob - Don't you mean `Range` when you refer to `String.Range`? The latter type is not recognized on my system. – Shuzheng Aug 18 '19 at 10:46
  • @Rob - Also, why don't you `NSRange(region: RangeExpression)` instead of the form `NSRange(region: RangeExpression, in: StringProtocol)`? – Shuzheng Aug 18 '19 at 10:56
  • Yes, I meant `Range` of `String.Index`. But, no, I meant `NSRange(_:in:)`, not `NSRange(_:)`. I quote from documentation/headers: “Although the Swift overlay updates many Objective-C methods to return native Swift indices and index ranges, some still return instances of `NSRange`. To convert an `NSRange` instance to a range of `String.Index`, use the `Range(_:in:)` initializer, which takes an `NSRange` and a string as arguments.” Also see the examples in that video. – Rob Aug 18 '19 at 14:34
  • The `NSRange(_:)` rendition is only used when building from a range of fixed width integers (e.g. code like `let nsRange = NSRange(1..<3)`). But that’s of no utility when converting a `Range` to `NSRange`... – Rob Aug 18 '19 at 14:43