0

Im learning Scrapy by scraping some tripadvisor attraction. (for example this page: https://www.tripadvisor.com/Attractions-g187791-Activities-Rome_Lazio.html). One of the problems I'm facing is that if a review is longer than a certain length it will display a "more" button, which then with javascript expands the text. I'm not able to post the request to expand the text on these reviews.

folloing the advice given here: Click a Button in Scrapy

I've fetched the post request: Request URL:https://www.tripadvisor.com/OverlayWidgetAjax?Mode=EXPANDED_HOTEL_REVIEWS_RESP&metaReferer=

Request method:POST

In scrapy shell I've tried the following:

from scrapy.http.request import Request 
fetch("https://stackoverflow.com/questions/6682503/click-a-button-in-scrapy")
Request(url="https://www.tripadvisor.com/OverlayWidgetAjax?Mode=EXPANDED_HOTEL_REVIEWS_RESP&metaReferer=",method="POST")   

and then compared the results of said reviews by calling:

response.css("p.partial_entry").getall()        

In difference to the stack example this doesn't have a form to send, but only a simple request which is to expand the texts. 1 click on a "more" button expands all those cases on the page. So I would expect this method to expand all texts, but it doesnt expand anything, or change anything on the page.

My main problem is that I'm having a hard time testing since I'm working through the scrapy shell or bash.

Anton
  • 581
  • 1
  • 5
  • 23
  • Sending a post request from Scrapy can't possibly change anything on a page you've already downloaded. – Daniel Roseman Feb 06 '19 at 11:55
  • That makes sense. Should my approach be to send a seperate request to the url that is in the post request? – Anton Feb 06 '19 at 12:01
  • If you look at how they do that POST request you'll notice they send along additional request headers and a request body. If you can figure out where to get all that information then you could possibly try to go ahead with what you did. However, in order to handle this kind of Javascript behavior, I would suggest you rather have a look at [Splash](https://splash.readthedocs.io/en/stable/install.html) and its [Scrapy integration](https://github.com/scrapy-plugins/scrapy-splash). It is more complicated, but will allow you to actually click the link so that it loads the extra text. – malberts Feb 06 '19 at 13:53

1 Answers1

0

Request(…) is not actually fetching that request and updating response accordingly. Use fetch(Request(…)) instead.

Gallaecio
  • 3,620
  • 2
  • 25
  • 64