0
       $subject= "Citation flow <img src='/static/images/icons/help.png'>
                                </span>
                            </p>
                            <p style='font-size: 150%;'><b>11</b></p>";
           $pattern="/Citation flow[.]+<b>([0-9]+)<\/b>/i";

          preg_match_all($pattern, $subject,$matches,PREG_PATTERN_ORDER);

          print_r($matches);

I want to capture the number 11 inisde the bold tags.. but my regex expression doesnt work.. why?

UPDATE:

I came up with this.. but I am not 100% it is the best solution:

$pattern="/Citation flow[\s\S]*<b>([0-9]+)<\/b>/i";
Dmitry Makovetskiyd
  • 6,942
  • 32
  • 100
  • 160
  • 3
    When you say `[.]+` you actually mean `.+`. `[.]+` matches a string of dots; the dot is not special inside a character class. – lanzz Jun 18 '12 at 12:15
  • I'd strongly recommend using an actual HTML parser, like simpledom - http://simplehtmldom.sourceforge.net/. – Adam Jun 18 '12 at 12:17
  • You seem to be trying to parse HTML with regular expressions. You may want to read http://stackoverflow.com/a/1732454/110707 – Wooble Jun 18 '12 at 12:17
  • I know about simple_html_parse.. but the doc I am trying to scrape hasnt got many unique tags – Dmitry Makovetskiyd Jun 18 '12 at 12:22

2 Answers2

3

Well, it cannot match as Citation flow has a space after it, not an arbitrary number of dots. You probably meant

(?si)Citation flow.+<b>(\d+)</b>
Joey
  • 344,408
  • 85
  • 689
  • 683
0

Do you not want

$pattern="/Citation flow[.]+<b>([0-9]+)<\/b>/si"; 
BugFinder
  • 17,474
  • 4
  • 36
  • 51