0

I am trying to scrape review details from tripadvisor.com but for it to be worth something, I need to be able to retrieve the ratings as well. However, they are given in images instead of actual numbers. The image have distinct class names for instance a 5 star rating image has the following code class="ui_bubble_rating.bubble_50 and a 1 star rating has the following: class="ui_bubble_rating.bubble_10".

I have been able to retrieve the following:

In [19]: response.css('div.location-review-review-list-parts-RatingLine__bubbles--GcJvM > span').extract_first()                                                                                            
Out[19]: '<span class="ui_bubble_rating.bubble_40"></span>'

However, I would like to only retrieve ui_bubble_rating.bubble_40 as it would make the process easier.

Is there a way that I can scrape the class name so that I can retrieve the rating?

I have been trying it at with the following hotel but it could be any: https://www.tripadvisor.com/Hotel_Review-g188590-d6767297-Reviews-or15-XO_Hotels_Couture-Amsterdam_North_Holland_Province.html

Thanks a lot!

lodeboon
  • 11
  • 2
  • Can you share what you have tried? – Sri Apr 16 '20 at 19:43
  • @Sri I have added it to the message itself, thanks! – lodeboon Apr 16 '20 at 20:12
  • So you can get the tag with the data. There are several ways to proceed. You could directly parse the string after class=" and before the closing ". You could also do something like this https://stackoverflow.com/questions/48692446/extract-class-name-in-scrapy – Sri Apr 16 '20 at 20:16

0 Answers0