0

i'm trying to parse an HTML page using NSRegularExpressions.. The page is a repetition of this html code:

<div class="fact" id="fact66">STRING THAT I WANT</div> <div class="vote">
<a href="index.php?p=detail_fact&fact=106">#106</a> &nbsp; &nbsp; 
<span id="p106">246080 / 8.59  </span> &nbsp; &nbsp;
<span id="f106" class="vote2">
<a href="#" onclick="xajax_voter(106,3); return false;">(+++)</a> 
<a href="#" onclick="xajax_voter(106,2); return false;">(++)</a>  
<a href="#" onclick="xajax_voter(106,1); return false;">(+)</a> 
<a href="#" onclick="xajax_berk(106); return false;">(-)</a></span>
<span id="ve106"></span>
</div>

So, i'ld like to get the string between the div

 <div class="fact" id="fact66">STRING THAT I WANT</div>

So i made a regex that looks like this

<div class="fact" id="fact[0-9].*\">(.*)</div>

Now, in my code, i implement it using this:

    NSString *htmlString = [NSString stringWithContentsOfURL:[NSURL URLWithString:@"http://www.myurl.com"] encoding:NSASCIIStringEncoding error:nil];
NSRegularExpression* myRegex = [[NSRegularExpression alloc] initWithPattern:@"<div class=\"fact\" id=\"fact[0-9].*\">(.*)</div>\n" options:0 error:nil];
    [myRegex enumerateMatchesInString:htmlString options:0 range:NSMakeRange(0, [htmlString length]) usingBlock:^(NSTextCheckingResult *match, NSMatchingFlags flags, BOOL *stop) {
        NSRange range = [match rangeAtIndex:1];
        NSString *string =[htmlString substringWithRange:range];
        NSLog(string);
    }];

But it returns nothing... I tested my regex in Java and PHP and it works great, what am i doing wrong ?

Thanks

Abel
  • 315
  • 1
  • 4
  • 18
  • 1
    Just an FYI http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags – Joe May 09 '12 at 18:42
  • Obligatory, ["Using regular expressions to parse HTML: why not?"](http://stackoverflow.com/questions/590747/using-regular-expressions-to-parse-html-why-not) – Mike Samuel May 09 '12 at 18:49

1 Answers1

1

Try using this regex:

 @"<div class=\"fact\" id=\"fact[0-9]*\">([^<]*)</div>"

Regex:

fact[0-9].*

means: fact followed by a number between 0 and 9, followed by any character repeated any number of times.

I also suggest using:

([^<]*)

instead of

(.*)

to match between the two divs so to deal with regex greediness, or alternatively:

(.*?)

(? will make the regex non-greedy, so it stops at the first instance of </div>.

sergio
  • 68,819
  • 11
  • 102
  • 123
  • Thank you so much ! Just a question, how should i modify it to get what's in the `246080 / 8.59 ` (like here, i'ld like to get `246080 / 8.59`) ? – Abel May 09 '12 at 19:47
  • you are welcome; for the span, use: `@"([^<]*)"` – sergio May 09 '12 at 19:52
  • Thank you so much, works great ! And thanks for the explanation, it's nice to understand what thinks do ! – Abel May 09 '12 at 19:56