1

I want to grab name and coordinates of places from advisor.travel web site which content is CC so I need only first 10 page with name and coordinates...

Link to attractions is link format: http://en.advisor.travel/poi/1 so 'http://en.advisor.travel/poi/'.i where i is number of attraction

I want to get only first 10 attraction so i is beetween 1 and 10 and xpath for name is

//h1 tag

and xpath for coordinates is:

//span[@class='latitude']
//span[@class='longitude']

I now create a scraper and code is :

<?php


for ($i=0; $i<=10; $i++)
  {
  $dom2 = new DOMDocument();
  @$dom2->loadHTMLFile('http://en.advisor.travel' . $i);
  $xpath2 = new DOMXPath($dom2);
  $data = array();
  $data[name] = $xpath2->query("//h1");
  $data[latitude] = $xpath2->query("//span[@class='latitude']");
  $data[longitude] = $xpath2->query("//span[@class='longitude']");

  } 
echo '<pre>' . print_r($data, true) . '</pre>';



?>

but this code for result give me only this:

Array
(
    [name] => DOMNodeList Object
        (
            [length] => 0
        )

    [latitude] => DOMNodeList Object
        (
            [length] => 0
        )

    [longitude] => DOMNodeList Object
        (
            [length] => 0
        )

)       

So how I can fix it? What is problem here?

dr Code
  • 225
  • 4
  • 12

1 Answers1

2

You're suppressing the errors with (@) operator, so you didn't notice that the URL was actually incorrect.

The call should be:

@$dom2->loadHTMLFile('http://en.advisor.travel/poi/' . $i);

Further below, you have the following:

$data[name] = $xpath2->query("//h1");

There are two things wrong with this line (and the two lines below):

  • You're using a constant as key. You should wrap it in single quotes.
  • Even if the above error is corrected, you'll only get the values of the last iteration of your for loop. To correctly push the elements into your $data array, you'll have to use $data['key'][] syntax.

Instead of simply querying the XPath, you'll hav to access the textContent of that XPath node. For that, you can use textContent():

$data['name'][] = $xpath2->query("//h1")->item(0)->textContent;
$data['latitude'][] = $xpath2->query("//span[@class='latitude']")
                                                    ->item(0)->textContent;
$data['longitude'][] = $xpath2->query("//span[@class='longitude']")
                                                    ->item(0)->textContent;

The complete code should look like this:

<?php

for ($i=0; $i<=12; $i++)
{
    $dom2 = new DOMDocument();
    @$dom2->loadHTMLFile('http://en.advisor.travel/poi/' . $i);
    $xpath2 = new DOMXPath($dom2);
    $data = array();
    $data['title'][] = $xpath2->query("//h1")->item(0)->textContent;
    $data['latitude'][] = $xpath2->query("//span[@class='latitude']")->item(0)->textContent;
    $data['longitude'][] = $xpath2->query("//span[@class='longitude']")->item(0)->textContent;
    echo "<hr/>";
} 

echo '<pre>' . print_r($data, true) . '</pre>';

?>

Technically, this should work, but since there are 12 different URLs to be queried, I don't think this is a good idea and hence don't recommend it.

Amal Murali
  • 75,622
  • 18
  • 128
  • 150
  • what is the better way to do this? – dr Code Nov 03 '13 at 23:08
  • @drCode: There's no *better* way. Screen scraping is considered a very bad idea for this. Anyway, if you have the locations already, you can use [Google Maps API](http://stackoverflow.com/q/8633574/1438393) to get the latitude and longitude :) – Amal Murali Nov 03 '13 at 23:10
  • yes, but how I can send one by one request for scraping to this site... now I send all 10 request at same time... how I can send one by one ... etx. one request, 5 seconds pause and then again send other request... – dr Code Nov 03 '13 at 23:19
  • @drCode: You can use `sleep()` -- i.e. `sleep(5)` -- you might also want to set `set_time_limit(0)` in the very top of your script to make sure you don't reach the maximum execution time limit. – Amal Murali Nov 03 '13 at 23:26
  • @drCode: After `for ($i=0; $i<=12; $i++) {`. – Amal Murali Nov 03 '13 at 23:28