0

I'm just getting started with regular expressions and Swift Regex, so a heads up that my terminology my be incorrect. I have boiled this problem down to a very simple task:

I have input lines that have either just one word (a name) or start with the word "Test" followed by one space and then a name. I want to extract the name and also be able to access - without using match indices - the match to "Test " (which may be nil). Here is code that better describes the problem:

import RegexBuilder

let line1 = "Test John"
let line2 = "Robert"

let nameReference = Reference(String.self)
let testReference = Reference(String.self)

let regex = Regex {
    Optionally {
        Capture(as:testReference) {
            "Test "
        } transform : { text in
            String(text)
        }
    }
    Capture(as:nameReference) {
        OneOrMore(.any)
    } transform : { text in
        String(text)
    }
}

if let matches = try? regex.wholeMatch(in: line1) { // USE line1 OR line2 HERE
    let theName = matches[nameReference]
    print("Name is \(theName)")
    // using index to access the test flag works fine for both line1 and line2:
    if let flag = matches.1, flag == "Test " {
        print("Using index: This is a test line")
    } else {
        print("Using index: Not a test line")
    }
    // but for line2, attempting to access with testReference crashes:
    if matches[testReference] == "Test " { // crashes for line2 (not surprisingly)
        print("Using reference: This is a test line")
    } else {
        print("Using reference: Not a test line")
    }
}

When regex.wholeMatch() is called with line1 things work as expected with output:

Name is John
Using index: This is a test line
Using reference: This is a test line

but when called with line2 it crashes with a SIGABRT and output:

Name is Robert
Using index: Not a test line
Could not cast value of type 'Swift.Optional<Swift.Substring>' (0x7ff84bf06f20) to 'Swift.String' (0x7ff84ba6e918).

The crash is not surprising, because the Capture(as:testReference) was never matched.

My question is: is there a way to do this without using match indices (matches.1)? An answer using Regex Builder would be much appreciated:-)

The documentation says Regex.Match has a subscript(String) method which "returns nil if there's no capture with that name". That would be ideal, but it works only when the match output is type AnyRegexOutput.

rene
  • 1,975
  • 21
  • 33

2 Answers2

1

I don't think you can get away with not using indexes, or at least code that knows the index but might hide it. Regular expression parsing works like that in any language, because it's always assumed that you know the order of elements in the expression.

For something like this, your example could be simplified to something like

let nameRegex = Regex {
    ZeroOrMore("Test ")
    Capture { OneOrMore(.anyNonNewline) }
}

if let matches = try? nameRegex.wholeMatch(in: line2) {
    let (_, name) = matches.output
    print("Name: \(name)")
}

That works for both of your sample lines. The let (_, name) doesn't use a numeric index but it's effectively the same thing since it uses index 1 as the value for name.

If your data is as straightforward as these examples, a regular expression may be overkill. You could work with if line1.hasPrefix("Test ") to detect lines with Test and then drop the first 5 characters, for example.

Tom Harrington
  • 69,312
  • 10
  • 146
  • 170
  • Thanks for the answer! My actual code is much more complicated, and the optional part is not at the beginning; for future changes and maintainability I was hoping to avoid relying on index positions. – rene Dec 29 '22 at 20:11
  • If you can match the desired info based on some kind of pattern then you might be able to zero in on it. In most cases your regular expression needs to reflect the format of the data, which means that if a match exists you’ll know the index. – Tom Harrington Dec 30 '22 at 16:49
1

While I would prefer Tom Harrington's solution for this particular use case, the API supports optional references by setting the type of the reference to an Optional itself:

let nameReference = Reference(String.self)
let testReference = Reference(String?.self)  // The String? is crucial here

let regex = Regex {
    Optionally {
        Capture(as:testReference) {
            "Test "
        } transform : { text in
            String(text)
        }
    }
    Capture(as:nameReference) {
        OneOrMore(.any)
    } transform : { text in
        String(text)
    }
}

if let matches = try? regex.wholeMatch(in: line1) {
    if matches[testReference] == "Test " { // this does not cash, but returns a String?
        print("Using reference: This is a test line")
    } else {
        print("Using reference: Not a test line")
    }
}

Note: if you want to have a reference to an optional Substring (Reference(Substring?.self)), then you must use Capture(as:_:transform:), because otherwise the compiler complains that Substring? and Substring are not equivalent.

nd.
  • 8,699
  • 2
  • 32
  • 42
  • Thank you! This was exactly the kind of solution I was hoping for, allowing me to work with more complex situations without having to keep track of indices. – rene Mar 03 '23 at 18:25