0

I am quite new to web scraping so my question might be a little simple but it really bothers me a lot. I want to scrap some contents from TripAdvisor, but when I run the following command in YQL, it returns nothing.

select * from html where url="http://www.tripadvisor.com/Search?q=sunny+relax&geo=191#&ssrc=A&o=0.html"

Can anyone tells me why? Is there anything wrong with my commands?

Thank you in advance for your kind help.

  • You will probably need to get the webpage contents first - various bits on this page should get you started: http://stackoverflow.com/questions/34834038/php-find-and-get-value-based-on-another-one-from-html-table-parsed-file/34835046#34835046 The other method is to take advantage of the DOM model to extract content from the webpage objects. – Steve Feb 06 '16 at 01:35
  • You don't need the `.html` but it is just a search result page not an XML or YQL data source - like, for example https://developer.yahoo.com/yql/guide/yql_url.html This tutorial might help https://developer.yahoo.com/yql/guide/two-minute-tutorial.html – Steve Feb 06 '16 at 02:42

1 Answers1

0

It is because "/Search" page is disallowed in http://www.tripadvisor.com/robots.txt and YQL checks this in robots.txt.

You can try another page and use XPATH to select some nodes, e.g.:

select * from html where xpath = '//div[@class="listing_title"]/a' and url = 'http://www.tripadvisor.com/Hotels-g45963-Las_Vegas_Nevada-Hotels.html'
David Najman
  • 487
  • 4
  • 7