2

I have to get the location of a number from this site

lookupexpert.com/search_phone?phone_number=7322691678

And i want a regex that matches anything inside

<p class="location">OCEAN GATE, NJ</p>

How do i do that?

This is what i did so far

<?php

$subject = file_get_contents("http://lookupexpert.com/search_phone?phone_number=7322691678");

$pattern = '#\<p class="location"\>(.+?)\<\/p\>#s';
preg_match($pattern, $subject, $matches, PREG_OFFSET_CAPTURE, 3);
print_r($matches);

?>

and also ... i tried with Xpath, and didn't worked so good, because it is not properly validated

/html/body/div/div[2]/div/ul/li[2]/p[4]
Master345
  • 2,250
  • 11
  • 38
  • 50
  • 1
    [You shouldn't parse HTML with regex](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454) – Jeroen Jun 01 '12 at 14:49

4 Answers4

4

Try this:

$subject = file_get_contents( 'http://www.lookupexpert.com/search_phone?phone_number=7322691678');
preg_match_all( '#<p class="location">(.*?)</p>#', $subject, $matches);
var_dump( $matches[1][1]);

This outputs:

string(14) "OCEAN GATE, NJ" 

Demo

nickb
  • 59,313
  • 13
  • 108
  • 143
  • yes ... but when you try with $subject = file_get_contents("http://lookupexpert.com/search_phone?phone_number=7322691678"); it gives you string(8) "Location" .... Why is that?!?! – Master345 Jun 01 '12 at 15:01
  • @RowMinds - Because it's matching the table header, which is also a `

    `. Change the `preg_match` call to `preg_match_all`, and throw out the first result. I'll update my post.

    – nickb Jun 01 '12 at 15:04
  • For your example, regex is easier and simpler. You could use xpath, but I doubt you'll really *need* it. Your use-case is simple enough to use a regex and be done. – nickb Jun 01 '12 at 15:07
  • 1
    What if `

    ` becomes `

    ` tomorrow?

    – anubhava Jun 01 '12 at 15:22
  • `#` are the delimiters for the regex - You chose the same delimiting character in your sample regex. And @anubhava, simple questions yield simple answers. That's all. – nickb Jun 01 '12 at 15:34
2

Use this XPath

//p[@class='location']/text()

or this RegEx

(?<=<p class="location">)([^<>]+)(?=</p>)

code

preg_match_all('%(?<=<p class="location">)([^<>]+)(?=</p>)%', $subject, $result, PREG_PATTERN_ORDER);
$result = $result[1];
Cylian
  • 10,970
  • 4
  • 42
  • 55
  • Warning: simplexml_load_file() [function.simplexml-load-file]: test.txt:12: parser error : EntityRef: expecting ';' in D:\xampp\htdocs\testing\test.php on line 2 Warning: simplexml_load_file() [function.simplexml-load-file]: ps.googleapis.com/maps/api/js?key=AIzaSyC45oW6TD6fwQHjKmK1wIfZjco-YNnQfmU&sensor in D:\xampp\htdocs\testing\test.php on line 2 Warning: simplexml_load_file() [function.simplexml-load-file]: ^ in D:\xampp\htdocs\testing\test.php on line 2 Warning: simplexml_l .................... and so on ... its not properly validated ... – Master345 Jun 01 '12 at 14:58
1

Try this one..

$string = '<p class="location">OCEAN GATE, NJ</p>';
$pattern = '/<p class="location">(.*)<\/p>/';

$preg = preg_match_all($pattern, $string, $match);
print_r($match);
Wouter Dorgelo
  • 11,770
  • 11
  • 62
  • 80
  • same results Array ( [0] => Array ( [0] =>

    Location

    [1] => 1345 ) [1] => Array ( [0] => Location [1] => 1365 ) )
    – Master345 Jun 01 '12 at 14:56
1

Better not to rely on unreliable regex parsing the HTML and use a DOM parser instead. Use a code like this:

$doc = new DOMDocument();
libxml_use_internal_errors(true);
// assuming search_phone.html contains your saved HTML source
#$doc->loadHTMLFile('search_phone.html'); // loads your html
$xpath = new DOMXPath($doc);
$value = $xpath->evaluate("string(//li[starts-with(@class, 'recordItem')]/
                           p[@class='location']/text())"); 
echo "Location Name: [$value]\n"; // prints your location

OUTPUT:

Location Name: [OCEAN GATE, NJ]
anubhava
  • 761,203
  • 64
  • 569
  • 643
  • it works, thanks, but what are you doing there with xpath "string(//li[@class='recordItem clearfix']/ p[@class='location']/text())" ...... i mean why "/html/body/div/div[2]/div/ul/li[2]/p[4]" is not correct? its from mozilla, and second, why xpath instead regex? thank you – Master345 Jun 01 '12 at 15:04
  • Actually `$xpath->evaluate("string(/html/body/div/div[2]/div/ul/li[2]/p[4]/text())")` also works fine for me but I would prefer my XPATH since it is not dependent on index numbers `[2]`, `[4]` etc. So in future even if their web developer inserts one more `

    ` tag there my XPATH would still work. There are thousands (literally) of comments here suggesting people to not use regex for HTML parsing because of simple reason that regex cannot reliably parse a HTML. See here: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454

    – anubhava Jun 01 '12 at 15:21
  • i understand, but my method /html/body/div/div[2]/div/ul/li[2]/p[4]/text() is like a fastfood, you don't have to look into your html code, just open Mozilla, copy paste the xpath, and you're done – Master345 Jun 01 '12 at 15:31
  • Agreed and as I said that Mozilla generated XPATH will work fine as well with current HTML source. – anubhava Jun 01 '12 at 15:38