1

I'm trying to understand how to scrape decoded phone numbers from a yellow page website with PHP & Curl.

Here is an example URL: https://www.gelbeseiten.de/test

Normally you can technically do it with something like this:

$ch = curl_init();
$page = curl_exec($ch);

if(preg_match('#example html code (.*) example html code#', $page, $match))
    $result = $match[1];
    echo $result;

But on the page mentioned above you cannot directly find the phone number in the HTML code. There must be a way to get the phone number.

Can you please help me out?

Best regards,

Jennifer

user1219432
  • 153
  • 1
  • 6
  • 1
    It might be far easier to use `DOMDocument` and `XPath` - the element cntaining the phone number is handily assigned a class ( `phone` ) so you could access them directly rather than trying to use regular expressions – Professor Abronsius Apr 26 '17 at 14:26
  • @RamRaider the phone number is somehow encoded then appended into the HTML DOM, i think that what the OP asking about, so it is not about use regex or DOMDocument... etc – hassan Apr 26 '17 at 14:29
  • Possible duplicate of [Website Scraping Using PHP](http://stackoverflow.com/questions/26397335/website-scraping-using-php) – LuFFy Apr 26 '17 at 14:31

2 Answers2

0

Don't use regex to parse html, use an html parser like DOMDocument, i.e.:

$html = file_get_contents("https://www.gelbeseiten.de/test");
libxml_use_internal_errors(true);
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);

foreach ($xpath->query('//span[contains(@class,"nummer")]') as $item) {
    print trim($item->textContent);
}

Output:

(0211) 4 08 05(0211) 4 08 05(0211) 4 08 05(0211) 4 08 05(0231) 9 79 76(0231)...
Pedro Lobito
  • 94,083
  • 31
  • 258
  • 268
  • Thank you, but this script doesn't get the last two digits of the number. – user1219432 Apr 26 '17 at 14:53
  • This website has a protection that only shows the last part of phone number if javascript enabled, which is not the case with php. You may want to use www.seleniumhq.org/. – Pedro Lobito Apr 26 '17 at 15:02
0

As suggested in a comment - using an XPath expression yields the phone numbers as desired.

$url='https://www.gelbeseiten.de/test';

$dom=new DOMDocument;
$dom->loadHTMLFile( $url );
$xp=new DOMXpath( $dom );

$query='//li[@class="phone"]';
$col=$xp->query($query);

if( $col ){
    foreach( $col as $node )echo $node->nodeValue . "<br />";
}
$dom = $xp = $col = null;
Professor Abronsius
  • 33,063
  • 5
  • 32
  • 46
  • Thank you, unfortunately this script doesn't get the last two digits of the number. Do you have an idea, how to solve this? – user1219432 Apr 26 '17 at 14:57