0

I'm beginning to develop in Objective-C, and I'm have a problem finding the correct regular expression to list the anchor in an HTML document.

Example: I have this HTML code:

<ul>
    <li><a class="class1" href="/document1.html"></li>
    <li><a class="class1" href="/document2.html"></li>
    <li><a class="class1" href="/document3.html"></li>
</ul>

I want get a NSArray with a result like this:

/document1.html
/document2.html
/document3.html

How can I make a good regular expression for this?

Alan Moore
  • 73,866
  • 12
  • 100
  • 156
odelattre
  • 3
  • 2
  • [You don't](http://stackoverflow.com/a/1732454/1705725) – Kippie Oct 08 '13 at 14:20
  • possible duplicate of [RegEx match open tags except XHTML self-contained tags](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) – Cole Tobin Oct 12 '13 at 20:45

1 Answers1

0

It's complicated to do this properly in regex, with all of the generalities that HTML permits. It's better to use a HTML parser, such as Hpple. See Ray Wenderlich's How to Parse HTML on iOS.

But, if you're interested in just some special cases (e.g. the href always in double quotes), you can do something like:

NSRegularExpression *regex;
regex = [NSRegularExpression regularExpressionWithPattern:@"<a\\s[^>]*(?<=\\s)href\\s*=\\s*\"(.*?)\".*?>"
                                                  options:NSRegularExpressionCaseInsensitive
                                                    error:&error];

There are numerous limitations here, but maybe it's a starting point. For something more general, you really should pursue a HTML parser, not regex.

Rob
  • 415,655
  • 72
  • 787
  • 1,044