Scrapy not scraping entire text

Question

I developed a scrapy project which will scrape the text from reviews section of a particular hotel from tripadvisor.in,

However, The scraper scrapes only a part of the review and not the entire one, I don't know why.

For instance this is one particular review

We stayed at the Acron for 6 nights during January. Everything about the hotel is perfect. The staff are excellent as is the service.

Surprisingly the prices of drinks and wine are very reasonable in the hotel so no need to wander out. We ate in the hotel on 5 of our 6 nights and did not have a bad meal. The one night which we ate out was at "go with the flow" which is a very nice restaurant with excellent food, about 200 yards from the hotel.

Don't expect too much from the local beaches. We found them to be crowded and dirty.

Breakfast is unusual because it runs from 7 AM to 1 PM. Again the food is excellent and well presented.

Don't expect too much from the local beaches. We found them to be crowded and dirty.

Other than that, a great stay. Thank you to all the staff.

Stayed January 2017, travelled as a couple

However, the scraped review comes out to be only:

We stayed at the Acron for 6 nights during January. Everything about the hotel is perfect.\nThe staff are excellent as is the service.\nSurprisingly the prices of drinks and wine are very reasonable in the hotel so no need to wander out.\nWe ate in the hotel on 5 of our 6 nights and did not have a bad...

All I want is to scrape the entire review also excluding the escape characters, how do I do that?

Refer to this link for the reviews: Reviews

Also, I want to scrape the other info like username, date of review published etc, which all comes out perfectly however, I want to scrape

username 2. date 3. review 4. title

and store each review with all the above details in dictionary, so how do I do that with all the reviews on the webpage,

for eg :

Username1 Date1 Title1 Review1

Username2 Date2 Title2 Review2

Username3 Date3 Title3 Review3
   .        .      .     .
   .        .      .     .
Usernamen Daten Titlen Reviewn

and export this dictionary in the csv or json format ?

here is the piece of code:

 def parse(self,response):
    for reviews in response.css('#taplc_hr_reviews_list_0'):
        username = response.css('  div.username.mo > span::text').extract_first()
        head = response.css('div > div > div > div > a > span::text').extract_first()
        date = response.css('.reviewItemInline').xpath('span/@title').extract_first()
        review = response.css('div>div.col2of2>div>div.wrap>div>div>p::text').extract_first()
        holder = {'User':username,'Title':head,'Date':date,'Review':review}

        yield holder

score 0 · Answer 1 · edited May 23 '17 at 12:17

0

This is JavaScript who expands these texts and allows you to see full reviews, and Scrapy can't run JavaScript code.

You can work around this by following links to the full reviews and scraping the data from those pages.

Or, you can use Selenium. This and this questions might help.

edited May 23 '17 at 12:17

Community

1
1

answered Feb 05 '17 at 08:56

cotique

56
5

score 0 · Answer 2 · edited Jun 20 '20 at 09:12

0

You cannot scrape review from that page,

You will have to make POST call to https://www.tripadvisor.in/OverlayWidgetAjax?Mode=EXPANDED_HOTEL_REVIEWS&metaReferer=Hotel_Review along with these values.

Where reviews can be grabbed from attribute data-reviewid on https://www.tripadvisor.in/Hotel_Review-g635747-d7289335-Reviews-Acron_Waterfront_Resort-Baga_Goa.html page

edited Jun 20 '20 at 09:12

Community

1
1

answered Feb 05 '17 at 11:43

Umair Ayub

19,358
14
72
146

Scrapy not scraping entire text

2 Answers2