0

Possible Duplicate:
Parsing HTML NSRegularExpression

I have an NSString like this:

NSString *string = @"<a href='http://john.com'>JOHN</a> http://john.com";

I want to use a regex to parse out the URLS not in an anchor tag, so I can put them in an anchor tag.

I currently have this:

NSRegularExpression *URLRegex = [NSRegularExpression
                                 regularExpressionWithPattern:@"((https?):\\/\\/[-A-Z0-9+&@#\\/%?=~_|!:,.;]*[-A-Z0-9+&@#\\/%=~_|])" options:NSRegularExpressionCaseInsensitive error:nil];

This does detect the URLS but it also detects the URLS in an anchor tag, which is problematic.

Does anyone know what I need to do? Thanks.

UPDATE:

@"([^\'](https?):\\/\\/[-A-Z0-9+&@#\\/%?=~_|!:,.;]*[-A-Z0-9+&@#\\/%=~_|][^\'])"

This pattern supplied by Alex below, is an improvement. But if I have a string like this @"http://example.com; john.com"; - example.com is matched. How can I exclude that? Basically I don't want anything in an anchor tag to be matched.

Community
  • 1
  • 1

1 Answers1

0

In general, just given how regex works, trying to capture "not" something, is much more difficult than trying to capture the something. You can easily implement the above with some sed commands or an implementation of strip, etc.

Given the format you have above, would something like this work, or is going to exclude too many corner cases for you?

"([^\'](https?):\\/\\/[-A-Z0-9+&@#\\/%?=~_|!:,.;]*[-A-Z0-9+&@#\\/%=~_|][^\'])"

ie, we're looking to make sure that your URL isn't inside of quotation marks. It'll fail on things like:

"tom went to 'https://www.google.com' to find the..."

But I dunno if that matters to you.

Alex Mann
  • 11
  • 1
  • This works well. But what if I have a string like this @"http://example.com http://john.com"; - http://example.com is matched. How can I exclude that? Basically I don't want anything in an anchor tag to be matched. –  Jan 13 '13 at 20:05