First a more usual way to do processing like this would be to tokenise the input; this both makes handling each kind of token easier and is probably more efficient for large inputs. That said, here is how to solve your problem using regular expressions.
Consider:
matchesInString:options:range
returns all the non-overlapping matches for a regular expression.
Regular expressions are built from smaller regular expressions and can contain alternatives. So if you have REemphasis which matches strings to emphasise and REurl which matches URLs, then (REemphasis)|(REurl) matches both.
NSTextCheckingResult
, instances of which are returned by matchesInString:options:range
, reports the range of each group in the match, and if a group does not occur in the result due to alternatives in the pattern then the group's NSRange.location
is set to NSNotFound
. So for the above pattern, (REemphasis)|(REurl), if group 1 is NSNotFound
the match is for the REurl alternative otherwise it is for REemphasis alternative.
The method replacementStringForResult:inString:offset:template
will return the replacement string for a match based on the template (aka the replacement pattern).
The above is enough to write an algorithm to do what you want. Here is some sample code:
- (NSString *) convert:(NSString *)input
{
NSString *emphPat = @"(_([^_]+)_)"; // note this pattern does NOT allow for markdown's \_ escapes - that needs to be addressed
NSString *emphRepl = @"<em>$2</em>";
// a pattern for urls - use whatever suits
// this one is taken from http://stackoverflow.com/questions/6137865/iphone-reg-exp-for-url-validity
NSString *urlPat = @"([hH][tT][tT][pP][sS]?:\\/\\/[^ ,'\">\\]\\)]*[^\\. ,'\">\\]\\)])";
// construct a pattern which matches emphPat OR urlPat
// emphPat is first so its two groups are numbered 1 & 2 in the resulting match
NSString *comboPat = [NSString stringWithFormat:@"%@|%@", emphPat, urlPat];
// build the re
NSError *error = nil;
NSRegularExpression *re = [NSRegularExpression regularExpressionWithPattern:comboPat options:0 error:&error];
// check for error - omitted
// get all the matches - includes both urls and text to be emphasised
NSArray *matches = [re matchesInString:input options:0 range:NSMakeRange(0, input.length)];
NSInteger offset = 0; // will track the change in size
NSMutableString *output = input.mutableCopy; // mutuable copy of input to modify to produce output
for (NSTextCheckingResult *aMatch in matches)
{
NSRange first = [aMatch rangeAtIndex:1];
if (first.location != NSNotFound)
{
// the first group has been matched => that is the emphPat (which contains the first two groups)
// determine the replacement string
NSString *replacement = [re replacementStringForResult:aMatch inString:output offset:offset template:emphRepl];
NSRange whole = aMatch.range; // original range of the match
whole.location += offset; // add in the offset to allow for previous replacements
offset += replacement.length - whole.length; // modify the offset to allow for the length change caused by this replacement
// perform the replacement
[output replaceCharactersInRange:whole withString:replacement];
}
}
return output;
}
Note the above does not allow for Markdown's \_ escape sequence and you need to address that. You probably also need to consider the RE used for URLs - one was just plucked from SO and hasn't been tested properly.
The above will convert
http://example.com/text_with_underscores _emph_
to
http://example.com/text_with_underscores <em>emph</em>
HTH