I developed a scrapy project which will scrape the text from reviews section of a particular hotel from tripadvisor.in,
However, The scraper scrapes only a part of the review and not the entire one, I don't know why.
For instance this is one particular review
We stayed at the Acron for 6 nights during January. Everything about the hotel is perfect. The staff are excellent as is the service.
Surprisingly the prices of drinks and wine are very reasonable in the hotel so no need to wander out. We ate in the hotel on 5 of our 6 nights and did not have a bad meal. The one night which we ate out was at "go with the flow" which is a very nice restaurant with excellent food, about 200 yards from the hotel.
Don't expect too much from the local beaches. We found them to be crowded and dirty.
Breakfast is unusual because it runs from 7 AM to 1 PM. Again the food is excellent and well presented.
Don't expect too much from the local beaches. We found them to be crowded and dirty.
Other than that, a great stay. Thank you to all the staff.
Stayed January 2017, travelled as a couple
However, the scraped review comes out to be only:
We stayed at the Acron for 6 nights during January. Everything about the hotel is perfect.\nThe staff are excellent as is the service.\nSurprisingly the prices of drinks and wine are very reasonable in the hotel so no need to wander out.\nWe ate in the hotel on 5 of our 6 nights and did not have a bad...
All I want is to scrape the entire review also excluding the escape characters, how do I do that?
Refer to this link for the reviews: Reviews
Also, I want to scrape the other info like username, date of review published etc, which all comes out perfectly however, I want to scrape
- username 2. date 3. review 4. title
and store each review with all the above details in dictionary, so how do I do that with all the reviews on the webpage,
for eg :
Username1 Date1 Title1 Review1 Username2 Date2 Title2 Review2 Username3 Date3 Title3 Review3 . . . . . . . . Usernamen Daten Titlen Reviewn
and export this dictionary in the csv or json format ?
here is the piece of code:
def parse(self,response):
for reviews in response.css('#taplc_hr_reviews_list_0'):
username = response.css(' div.username.mo > span::text').extract_first()
head = response.css('div > div > div > div > a > span::text').extract_first()
date = response.css('.reviewItemInline').xpath('span/@title').extract_first()
review = response.css('div>div.col2of2>div>div.wrap>div>div>p::text').extract_first()
holder = {'User':username,'Title':head,'Date':date,'Review':review}
yield holder