4

I have been trying to extract a piece of text inside an string using regular expressions in Swift. The text I want to extract is inside double quotes so I'm trying to target those double quotes and get the piece of text inside.

This is the RegExp that I'm using: (?<=")(?:\\.|[^"\\])*(?=")

It work pretty well with any kind of text and it could be even simpler since I'm looking for anything that could be inside those double quotes.

When I try to use this RegExp with Swift I have to scape the double quotes in it, but for some reason the RegExp doesn't work with escaped double quotes e.g. (?<=\")(?:\\.|[^\"\\])*(?=\").

Even if I try some as simple as this \" the RegExp doesn't match any double quote in the string.

Code Example

func extractText(sentence: String?) -> String {
    let pattern = "(?<=\")(?:\\.|[^\"\\])*(?=\")"
    let source = sentence!

    if let range = source.range(of: pattern, options: .regularExpression) {
        return "Text: \(source[range])"
    }

    return ""
}

extractText("Hello \"this is\" a test") -> "this is"

To have in mind:

  • All these RegExps must be inside double quotes to create the string literal that is going to be used as a pattern.
  • I'm using the String's range method with the .regularExpression option to match the content.
  • I'm using Swift 4 with an Xcode 9 Playground

How can I scape double quotes in Swift to successfully match these in a string?

Solution

Thanks to @Atlas_Gondal and @vadian I noticed the problem "in part" is not the RegExp but the string I'm getting which is using a different type of double quotes “ ... ” so I have to change my pattern to something like this "(?<=“).*(?=”)" in order to use it.

The resulted code looks like this:

func extractText(sentence: String?) -> String {
    let pattern = "(?<=“).*(?=”)"
    let source = sentence!

    if let range = source.range(of: pattern, options: .regularExpression) {
        return "\(source[range])"
    }

    return ""
}
David Gomez
  • 2,762
  • 2
  • 18
  • 28
  • There is a possibility of an escape before the opening quote. This `(?<!\\")` prevents it from matching. So, in total it would now be `(?<!\\")(?<=")` –  Jul 07 '17 at 19:40
  • Btw, all regex engines interpret `(?<=\")` as this `(?<=")` so if it's not now working it's something when the language parses the string. –  Jul 07 '17 at 19:45
  • This right now extracts single quoted word in the entire string. If the string has multiple quoted words it does not work. Any help ? – nr5 Sep 10 '19 at 09:28

3 Answers3

4

range(of with regularExpression option can't do that because it's not able to capture groups.

You need real NSRegularExpression

func extractText(sentence: String) -> String {
    let pattern = "\"([^\"]+)\""
    let regex = try! NSRegularExpression(pattern: pattern)


    if let match = regex.firstMatch(in: sentence, range: NSRange(sentence.startIndex..., in: sentence)) {
        let range = Range(match.range(at: 1), in: sentence)!
        return String(sentence[range])
    }

    return ""
}

extractText(sentence:"Hello \"this is\" a test")

The pattern is much simpler: Search for a double quote followed by one or more non-double-quote characters followed by a closing double quote. Capture the characters between the double quotes.

Note that escaped double quotes in a literal string are only virtually escaped.

PS: Your code doesn't compile without the parameter label in Swift 3 nor 4.

vadian
  • 274,689
  • 30
  • 353
  • 361
  • This right now extracts single quoted word in the entire string. If the string has multiple quoted words it does not work. Any help ? – nr5 Sep 10 '19 at 09:29
  • @nr5 Rather than `firstMatch` use `matches` and use a loop to iterate the array. – vadian Sep 10 '19 at 09:34
  • Can you please check this: https://stackoverflow.com/questions/57852915/find-multiple-quoted-words-in-a-string-with-regex – nr5 Sep 10 '19 at 10:43
0

try this code:

extension String {
func capturedGroups(withRegex pattern: String) -> [String] {
    var results = [String]()

    var regex: NSRegularExpression
    do {
        regex = try NSRegularExpression(pattern: pattern, options: [])
    } catch {
        return results
    }

    let matches = regex.matches(in: self, options: [], range: NSRange(location:0, length: self.characters.count))

    guard let match = matches.first else { return results }

    let lastRangeIndex = match.numberOfRanges - 1
    guard lastRangeIndex >= 1 else { return results }

    for i in 1...lastRangeIndex {
        let capturedGroupIndex = match.rangeAt(i)
        let matchedString = (self as NSString).substring(with: capturedGroupIndex)
        results.append(matchedString)
    }

    return results
}
}

Use extension like this:

print("This is \"My String \"".capturedGroups(withRegex: "\"(.*)\""))

Sample Screenshot enter image description here

Atlas_Gondal
  • 2,512
  • 2
  • 15
  • 25
0

Even though it's a bit late, I've fixed it by using a raw string.

Since Swift 5 you can do this:

let pattern = #"(?<=“).*(?=”)"# // <- Note the # in front and after.
// ...

And you are good to go. By far the simplest solution in my opinion!

⚠️ Note: This means that every character inside of the double quotes gets taken literally (no more templating ("\(variable)" or new lines \n)).

Here is a great article about raw strings.

Throvn
  • 795
  • 7
  • 19