24

I am using Swift 3 and trying to access captured groups.

let regexp = "((ALREADY PAID | NOT ALR | PROVIDER MAY | READY | MAY BILL | BILL YOU | PAID)((.|\\n)*))(( \\d+)(\\.+|-+)(\\d\\d))"

// check if some substring is in the recognized text
if let range = stringText.range(of:regexp, options: .regularExpression) {
    let result = tesseract.recognizedText.substring(with:range)
}

I want to be able to extract out the last two numbers captured (\d\d) so if the text was: ALREADY PAID asfasdfadsfasdf 39.15, it would extract 15. Here is a regex builder that shows what I want. Normally, I would be able to do $8 to get the 8th group that was extracted but I don't know how to do that in Swift 3.

http://regexr.com/3fh1e

rmaddy
  • 314,917
  • 42
  • 532
  • 579
noblerare
  • 10,277
  • 23
  • 78
  • 140
  • Never use `(.|\\n)*`, just use `.*` and add a `(?s)` at the pattern start (or use the corresponding flag). – Wiktor Stribiżew Mar 14 '17 at 15:23
  • 1
    Use `rangeAt(...)`. Examples here: http://stackoverflow.com/a/40952603/1187415 and here: http://stackoverflow.com/a/40040472/1187415 and here: http://stackoverflow.com/a/31817292/1187415 – Martin R Mar 14 '17 at 15:29

4 Answers4

43

Swift 4, Swift 5

extension String {
    func groups(for regexPattern: String) -> [[String]] {
    do {
        let text = self
        let regex = try NSRegularExpression(pattern: regexPattern)
        let matches = regex.matches(in: text,
                                    range: NSRange(text.startIndex..., in: text))
        return matches.map { match in
            return (0..<match.numberOfRanges).map {
                let rangeBounds = match.range(at: $0)
                guard let range = Range(rangeBounds, in: text) else {
                    return ""
                }
                return String(text[range])
            }
        }
    } catch let error {
        print("invalid regex: \(error.localizedDescription)")
        return []
    }
}
}

example:

let res = "1my 2own 3string".groups(for:"(([0-9]+)[a-z]+) ")

(lldb) po res ▿ 2 elements
▿ 0 : 3 elements

- 0 : "1my "

- 1 : "1my"

- 2 : "1"   

▿ 1 : 3 elements

- 0 : "2own "

- 1 : "2own"

- 2 : "2"
Community
  • 1
  • 1
Vyacheslav
  • 26,359
  • 19
  • 112
  • 194
  • 3
    Wow, nice to see the "modern language" Swift doing something that needs 2 lines of code in 25 years old Javascript ‍♂️ – Jonas Sourlier Apr 22 '22 at 10:01
37

but I don't know how to do that in Swift 3.

When you receive a match from NSRegularExpression, what you get is an NSTextCheckingResult. You call rangeAt to get a specific capture group.

Example:

let s = "hey ho ha"
let pattern = "(h).*(h).*(h)"
// our goal is capture group 3, "h" in "ha"
let regex = try! NSRegularExpression(pattern: pattern)
let result = regex.matches(in:s, range:NSMakeRange(0, s.utf16.count))
let third = result[0].rangeAt(3) // <-- !!
third.location // 7
third.length // 1
matt
  • 515,959
  • 87
  • 875
  • 1,141
  • 2
    third is an `NSRange`, how do you convert it to the `Range` type required to use it in `s.substring`? or otherwise, where is the "h" result here? Is the only way to convert s to NSString? Is there a simpler way to use regex? This looks excessive. – Efren Jul 31 '17 at 04:20
  • @Efren NSRange to Range conversion for strings is a new Swift 4 feature. – matt Jul 31 '17 at 16:28
  • @Efren It's annoying that regular expressions are a Cocoa feature, not a Swift feature, but that's how it is. And Cocoa thinks in NSString and NSRange, obviously. But in Swift 4 Range and NSRange are mutually coercible even for strings, so it's really no problem. – matt Aug 01 '17 at 01:44
  • @Efren: With respect to NSRange/Range conversion in connection with NSRegularExpression, [this Q&A](https://stackoverflow.com/questions/27880650/swift-extract-regex-matches) might be of interest. – Martin R Oct 07 '17 at 17:42
8

As ever, a simple extension seems to be the way around swift's bizarre overcomplication...

extension NSTextCheckingResult {
    func groups(testedString:String) -> [String] {
        var groups = [String]()
        for i in  0 ..< self.numberOfRanges
        {
            let group = String(testedString[Range(self.range(at: i), in: testedString)!])
            groups.append(group)
        }
        return groups
    }
}

Use it like this:

if let match = myRegex.firstMatch(in: someString, range: NSMakeRange(0, someString.count)) {
     let groups = match.groups(testedString: someString)
     //... do something with groups
}
Confused Vorlon
  • 9,659
  • 3
  • 46
  • 49
0

A slightly altered version based on @Vyacheslav's answer with different error handling approach:

enum ParsingError: Error {
    // You can pass more info here with parameter(s) if you want, e.g. `case let invalidRange(originalString, failedAtRange)`
    case invalidRange 
}

protocol StringUtilityRequired {
    var stringUtility: StringUtility { get }
}

extension StringUtilityRequired {
    var stringUtility: StringUtility { StringUtility() }
}

enum StringUtility {
    func groups(_ str: String, pattern: String) throws -> [[String]] {
        let regex = try NSRegularExpression(pattern: pattern)
        let matches = regex.matches(in: str, range: NSRange(str.startIndex..., in: str))
        return try matches.map { match throws in
            return try (0 ..< match.numberOfRanges).map { range throws in
                let rangeBounds = match.range(at: range)
                guard let range = Range(rangeBounds, in: str) else {
                    throw ParsingError.invalidRange
                }
                return String(str[range])
            }
        }
    }

    // This component is stateless; it doesn't have any side effect
    case pure
    init() { self = .pure }
}

Usage:

struct MyComponent: StringUtilityRequired {
    func myFunc() throws {
        let groups = try stringUtility.groups("Test 123", pattern: "(.+)\s(.+)")
        print(groups)
    }
}
superarts.org
  • 7,009
  • 1
  • 58
  • 44