1

In my application i am trying to get the google indexed pages and i came to know that the number is available in following div

<div id="resultStats"> About 1,960,000 results (0.38 seconds) </div> 

now my question is how to extract the number from above div in a web page

3 Answers3

4

Never user regexp to parse HTML. (See: RegEx match open tags except XHTML self-contained tags)

Use a HTML parser, like SimpleDOM (http://simplehtmldom.sourceforge.net/)

You can the use CSS rules to select:

$html = file_get_html('http://www.google.com/');
$divContent =  $html->find('div#resultStats', 0)->plaintext;

$matches = array();
preg_match('/([0-9,]+)/', $divContent, $matches);
echo $matches[1];

Outputs: "1,960,000"
Community
  • 1
  • 1
netdigger
  • 3,659
  • 3
  • 26
  • 49
  • it will print `About 1,960,000 results (0.38 seconds) ` but he wanted number so regex is neccesary. – Robert Jun 28 '13 at 08:12
  • Once you've got the string out of the div, extracting the number from the string should be trivial. – GordonM Jun 28 '13 at 08:13
  • Quoting OP *now my question is how to extract the number from above div in a web page* I don't write that it's bad solution to use file_get_html() it's good but that's not what he wants. At least I understand it in that way. He wanted to know how to extract particular number. – Robert Jun 28 '13 at 08:14
  • Yeah, missed that. Added a regexp. – netdigger Jun 28 '13 at 08:20
  • 1
    This regexp prints `0.38 seconds` ehhh ;) Moreover you overwrite the result of function find() which is senseless too. – Robert Jun 28 '13 at 08:21
  • Now stealing regex from my answer which was **bad** haha :) My answer can be modifed not to use `
    ` and it will work to :> So in the end you use external library + regex(which is bad) - brilliant.
    – Robert Jun 28 '13 at 08:27
  • I'd say our regexps is pretty different i'd say, yeah, I didnt read the full question at first, but [0-9,]+ I came up with myself, but it's kind the goto answer for anyone id say if trying to match 1,323,232,232.. etc. – netdigger Jun 28 '13 at 08:30
  • 1
    It helps me to move further.Thankz a ton –  Jun 28 '13 at 09:55
3
$str = '<div id="resultStats"> About 1,960,000 results (0.38 seconds) </div> ';

$matches = array();
preg_match('/<div id="resultStats"> About ([0-9,]+?) results[^<]+<\/div>/', $str, $matches);

print_r($matches);

Output:

Array ( 
        [0] => About 1,960,000 results (0.38 seconds)
        [1] => 1,960,000 
      )

This is simple regex with subpatterns

  • ([0-9,]+?) - means 0-9 numbers and , character at least 1 time and not greedy.
  • [^<]+ - means every character but < more than 1 time

echo $matches[1]; - will print the number you want.

Robert
  • 19,800
  • 5
  • 55
  • 85
1

You can use regex ( preg_match ) for that

$your div_string = '<div id="resultStats"> About 1,960,000 results (0.38 seconds) </div>';

preg_match('/<div.*>(.*)<\/div>/i', $your div_string , $result);

print_r( $result );

output will be

Array  (
   [0] => <div id="resultStats"> About 1,960,000 results (0.38 seconds) </div>
   [1] =>  About 1,960,000 results (0.38 seconds) 
)

in this way you can get content inside div

softsdev
  • 1,478
  • 2
  • 12
  • 27