-1

I'm trying to create a script with PHP and cURL that extracts the emails that are showed in this page.

I'm having problems to get them, because the list of hotels is loaded through an ajax call..

Well, my question: what are the alternatives to a PHP/cURL script? Maybe some tool like Selenium, Sahi or Watir (these are indicated for tests) that allow to open a browser???

Linux user.

EDIT:

After the answer of Pastor, I want to paste here my code with which I'm getting an xml file with the data of the hotels of the first page, but I want to access to the pages where the rest of the hotels are..how to do that?

<?php

//extract data from the post
extract($_POST);

//set POST variables
$url = 'http://www.turismovenezia.it/index.php';

$fields1 = array(
            'ajax'=>'searchEngineTopdata',
            'next_pair'=>'Dove Allogiare|*',
            'lang'=>'it');


$fields2 = array(

'ajax'=>'xmlSearchEngineResponder',
'xml' => "%3C%3Fxml%20version%3D%221.0%22%3F%3E%3CSearchRequest%20xmlns%3D%22http%3A%2F%2Fwww.liberologico.com%2Fdbsite%2Fjolly-search%22%20xmlns%3Axsi%3D%22http%3A%2F%2Fwww.w3.org%2F2001%2FXMLSchema-instance%22%3E%3CSearch%3E%3CScope%3E%3C%21%5BCDATA%5B%2A%5D%5D%3E%3C%2FScope%3E%3CFilters%3E%3CFilters%20xsi%3Atype%3D%22FilterSpecType%22%3E%3CField%3Eaptve_territorio%3C%2FField%3E%3CValue%3E%3CSingleValue%3E%3C%21%5BCDATA%5B%2A%5D%5D%3E%3C%2FSingleValue%3E%3C%2FValue%3E%3CMode%3ETHESAURUS%3C%2FMode%3E%3COperation%3ELIKE%3C%2FOperation%3E%3C%2FFilters%3E%3CFilters%20xsi%3Atype%3D%22FilterSpecType%22%3E%3CField%3Efull_text_search%3C%2FField%3E%3CValue%3E%3CSingleValue%3E%3C%21%5BCDATA%5B%2A%5D%5D%3E%3C%2FSingleValue%3E%3C%2FValue%3E%3CMode%3EFREE_TEXT%3C%2FMode%3E%3COperation%3ELIKE%3C%2FOperation%3E%3C%2FFilters%3E%3CFilters%20xsi%3Atype%3D%22FilterSpecType%22%3E%3CField%3Elang%3C%2FField%3E%3CValue%3E%3CSingleValue%3E%3C%21%5BCDATA%5Bit%5D%5D%3E%3C%2FSingleValue%3E%3C%2FValue%3E%3CMode%3EFREE_TEXT%3C%2FMode%3E%3COperation%3EEQUAL%3C%2FOperation%3E%3C%2FFilters%3E%3C%2FFilters%3E%3CSubSearches%3E%3CSearch%3E%3CScope%3E%3C%21%5BCDATA%5BEventi%5D%5D%3E%3C%2FScope%3E%3C%2FSearch%3E%3CSearch%3E%3CScope%3E%3C%21%5BCDATA%5BArte%20%26%20Cultura%5D%5D%3E%3C%2FScope%3E%3C%2FSearch%3E%3CSearch%3E%3CScope%3E%3C%21%5BCDATA%5BMare%20%26%20Natura%5D%5D%3E%3C%2FScope%3E%3C%2FSearch%3E%3CSearch%3E%3CScope%3E%3C%21%5BCDATA%5BPiatti%20%26%20Prodotti%20tipici%5D%5D%3E%3C%2FScope%3E%3C%2FSearch%3E%3CSearch%3E%3CScope%3E%3C%21%5BCDATA%5BRelax%20%26%20Divertimento%5D%5D%3E%3C%2FScope%3E%3C%2FSearch%3E%3CSearch%3E%3CScope%3E%3C%21%5BCDATA%5BDove%20Alloggiare%5D%5D%3E%3C%2FScope%3E%3C%2FSearch%3E%3CSearch%3E%3CScope%3E%3C%21%5BCDATA%5BDove%20Mangiare%5D%5D%3E%3C%2FScope%3E%3C%2FSearch%3E%3CSearch%3E%3CScope%3E%3C%21%5BCDATA%5BInformazioni%20Utili%5D%5D%3E%3C%2FScope%3E%3C%2FSearch%3E%3C%2FSubSearches%3E%3C%2FSearch%3E%3CActiveResultSet%3E%3CTab%3E%3C%21%5BCDATA%5BDove%20Alloggiare%5D%5D%3E%3C%2FTab%3E%3CFirstItem%3E0%3C%2FFirstItem%3E%3CPagerSize%3E10%3C%2FPagerSize%3E%3C%2FActiveResultSet%3E%3C%2FSearchRequest%3E",
'force' => 'false');

//open connection
$ch = curl_init();

//set the url, number of POST vars, POST data
curl_setopt($ch,CURLOPT_URL,$url);
curl_setopt($ch,CURLOPT_POST, true);
curl_setopt($ch,CURLOPT_POSTFIELDS, $fields1);
curl_setopt($ch,CURLOPT_RETURNTRANSFER,1);

//set the url, number of POST vars, POST data
curl_setopt($ch,CURLOPT_URL,$url);
curl_setopt($ch,CURLOPT_POST, true);
curl_setopt($ch,CURLOPT_POSTFIELDS, $fields2);
curl_setopt($ch,CURLOPT_RETURNTRANSFER,1);

//execute post
$result = curl_exec($ch);

echo $result;

//close connection
curl_close($ch);
PeeHaa
  • 71,436
  • 58
  • 190
  • 262
tirenweb
  • 30,963
  • 73
  • 183
  • 303

3 Answers3

2

You are probably not going to accept this answer, but maybe the community will. The answer you need to hear is: do not extract emails from webpages.

Whatever it is you want to promote to those addresses, you do not want to send them unsolicited email. That will both annoy those people, AND will cause your emails to be submitted to SpamCop or other spam blacklist. It is also very rude behavior.

And no, it does not matter if the email is "one time only" or if you provide "opt out instructions".

Try contacting the site itself for assistance. They may have a forum for you to post your message, or they may forward it on your behalf (without actually giving you the email database).

Scott Prive
  • 801
  • 11
  • 16
2

The list is an XML list being pulled by a POST request from http://www.turismovenezia.it/index.php. cURL is, IMO, your best tool for this action. Use your developer's console. Follow how the website initiates and fulfills the request and then mimic it.

RAW POST using cURL in PHP

Community
  • 1
  • 1
Pastor Bones
  • 7,183
  • 3
  • 36
  • 56
  • Thanks, I'm trying that but it's not easy for me...I have pasted the code I'm using in my question. Someone want to give me a hand? – tirenweb Dec 01 '11 at 16:34
-1

Have a look the simple_html_dom library

http://simplehtmldom.sourceforge.net/

Simpler way to load the DOM and manipulate with PHP

Sean H Jenkins
  • 1,770
  • 3
  • 21
  • 29