Is a regex of the following form legit in Obj C?
"<(img|a|div).*?>.*?</$1>"
I know it's valid in JS with a \1 instead of $1, but I'm having little luck in Obj C.
Is a regex of the following form legit in Obj C?
"<(img|a|div).*?>.*?</$1>"
I know it's valid in JS with a \1 instead of $1, but I'm having little luck in Obj C.
NSRegularExpression uses ICU Regular Expressions which uses \n
syntax for back references where n
is the nth capture group.
<(img|a|div).*?>.*?</\\1>
Yes, I do believe you can work with capture groups. I had to work with them a bit a little while ago and I have an example in:
-(NSString *) extractMediaLink:(NSString *)link withRegex:(NSString *)regex{
NSString * utf8Link = [link stringByRemovingPercentEncoding];
NSError * regexError = nil;
NSRegularExpression * regexParser = [NSRegularExpression regularExpressionWithPattern:regex
options:NSRegularExpressionCaseInsensitive|NSRegularExpressionUseUnixLineSeparators
error:®exError];
NSTextCheckingResult * regexResults = [regexParser firstMatchInString:utf8Link
options:0
range:NSMakeRange(0, [utf8Link length])];
NSString * matchedResults = [utf8Link substringWithRange:[regexResults rangeAtIndex:1]]; // the second capture group will always have the ID
return matchedResults.length ? matchedResults : @"";
}
When you use an instance of NSRegularExpression
to generate an NSTextCheckingResult
, the NSTextCheckingResult
has a property of numberOfRanges
which is documented with:
A result must have at least one range, but may optionally have more (for example, to represent regular expression capture groups).
In my example above (Note: I happen to be parsing HTML, but using an addition pod that traverses HTML by XPath queries, TFHpple -- a lifesaver if you absolutely have to parse HTML), I use the -[NSRegularExpression firstMatchInString:options:range:]
to check for the first instance of the tag that matches my regex pattern. From that NSTextCheckingResult
I pull out the proper index of the capture group I'm interested in (in this case, [regexResults rangeAtIndex:1]
)
But, getting to this point was a huge pain in the ass. But to make sure you're getting the right expressions I would highly recommend using Regex101 with the Python setting, and then passing the refined regex into Patterns (Mac App Store)
If you want the full look, I have a fairly detailed project here, but keep in mind it's still a WIP.