3

The problem is "\w+" works fine with just plain text. However, the goal is to avoid having the emoji characters included as whitespace.

Example:

"This is some text ".regex("\\w+")

Desired output:

["This","is","some","text",""]

Code:

extension String {
  func regex (pattern: String) -> [String] {
    do {
      let regex = try NSRegularExpression(pattern: pattern, options: NSRegularExpressionOptions(rawValue: 0))
      let nsstr = self as NSString
      let all = NSRange(location: 0, length: nsstr.length)
      var matches : [String] = [String]()
      regex.enumerateMatchesInString(self, options: NSMatchingOptions(rawValue: 0), range: all) {
        (result : NSTextCheckingResult?, _, _) in
        if let r = result {
          let result = nsstr.substringWithRange(r.range) as String
          matches.append(result)
        }
      }
      return matches
    } catch {
      return [String]()
    }
  }
}

The code above gives the following output:

"This is some text ".regex("\\w+")

// Yields:  ["This", "is", "some", "text"]
//  Note the  are missing.

Is it a coding issue, regex issue, or both? Other answers seem to show the same problem.

func matchesForRegexInText(regex: String!, text: String!) -> [String] {
  do {
    let regex = try NSRegularExpression(pattern: regex, options: [])
    let nsString = text as NSString
    let results = regex.matchesInString(text,
    options: [], range: NSMakeRange(0, nsString.length))
    return results.map { nsString.substringWithRange($0.range)}
   } catch let error as NSError {
    print("invalid regex: \(error.localizedDescription)")
    return []
   }
  }


let string = "This is some text "
let matches = matchesForRegexInText("\\w+", text: string)

// Also yields ["This", "is", "some", "text"]

My Mistake

\w+ is word boundary

 "This is some text \t ".regex("[^ |^\t]+")

// Give correct answer  ["This", "is", "some", "text", ""]
Mike Chirico
  • 3,381
  • 1
  • 23
  • 20

0 Answers0