0

Hi I am using the beautiful soup library to parse content from an html page.

I use the following script the get to the part of the page I want to:

review_list = soup.find(class_="review_list_score_breakdown_right")

<span class=" review_list_score_breakdown_right">
 <ul class="review_score_breakdown_list list_tighten clearfix" data-et-view="bLTQHcXJVNRCSPOMcAQJO:1 bLTQHcXJVNRCSPOMcAQJO:3 " id="review_list_score_breakdown">
  <li class="clearfix one_col" data-question="hotel_clean">
   <p class="review_score_name">
    Cleanliness
   </p>
   <div class="score_bar">
    <div class="score_bar_value" data-score="100" style="width: 100%;">
    </div>
   </div>
   <p class="review_score_value">
    10
   </p>
  </li>
  <li class="clearfix one_col" data-question="hotel_comfort">
   <p class="review_score_name">
    Comfort
   </p>
   <div class="score_bar">
    <div class="score_bar_value" data-score="100" style="width: 100%;">
    </div>
   </div>
   <p class="review_score_value">
    10
   </p>
  </li>
  <li class="clearfix one_col" data-question="hotel_services">
   <p class="review_score_name">
    Facilities
   </p>
   <div class="score_bar">
    <div class="score_bar_value" data-score="100" style="width: 100%;">
    </div>
   </div>
   <p class="review_score_value">
    10
   </p>
  </li>
  <li class="clearfix one_col" data-question="hotel_staff">
   <p class="review_score_name">
    Staff
   </p>
   <div class="score_bar">
    <div class="score_bar_value" data-score="100" style="width: 100%;">
    </div>
   </div>
   <p class="review_score_value">
    10
   </p>
  </li>
  <li class="clearfix one_col" data-question="hotel_value">
   <p class="review_score_name">
    Value for money
   </p>
   <div class="score_bar">
    <div class="score_bar_value" data-score="100" style="width: 100%;">
    </div>
   </div>
   <p class="review_score_value">
    10
   </p>
  </li>
  <li class="clearfix one_col" data-question="hotel_wifi">
   <p class="review_score_name">
    Free WiFi
   </p>
   <div class="score_bar">
    <div class="score_bar_value" data-score="100" style="width: 100%;">
    </div>
   </div>
   <p class="review_score_value">
    10
   </p>
  </li>
  <li class="clearfix one_col" data-question="hotel_location">
   <p class="review_score_name">
    Location
   </p>
   <div class="score_bar">
    <div class="score_bar_value" data-score="100" style="width: 100%;">
    </div>
   </div>
   <p class="review_score_value">
    10
   </p>
  </li>
 </ul>
</span>

I need to extract the score from the data-question tags. For example, if I want to know the hotel comfort score, I'd need to access data-question= "hotel_confort" I've tried with the function find() but it doesn't work.

Sweet_Cherry
  • 313
  • 1
  • 4
  • 13
Ian Spitz
  • 301
  • 8
  • 18
  • looks like all of your scores have the same `100` value. So, what's the point? If a real markup could differ - post a more representative markup – RomanPerekhrest Dec 23 '17 at 20:29

2 Answers2

0

I think what you need is the attrs find query. Your question is similar to Extracting an attribute value with beautifulsoup

I will make it a bit specific for your case.

review = soup.find(class_="review_list_score_breakdown_right")
input = review.find(attrs={"data-question" : "hotel-comfort"})
output = input['value']

It's been awhile since I used bs4 so please debug the code.

Edit: Here's some working code taken from your example string

review = soup.find('span', {'class' : "review_list_score_breakdown_right"})
input = review.find_all(attrs={"data-question": "hotel_comfort"})
print(input) #print the html extract which you can go down further.
Abhishek Dujari
  • 2,343
  • 33
  • 43
  • Sorry, let me try that offline and figure it out – Abhishek Dujari Dec 24 '17 at 03:51
  • I checked my answer and it is alright. But as I mentioned you can't just copy paste. You see I have a typo in `hotel-comfort` correct it and it will work. Skipe the `output=input['value']` and just print the input to proceed further down the tree. I updated my answer with a working solution I tested locally. – Abhishek Dujari Dec 24 '17 at 08:33
0

There is no hotel_confort attrs in your codes.

    review = soup.find(class_="review_list_score_breakdown_right")
    hotel = review.find(attrs={"data-question" : "hotel_comfort"})

This code returns

<li class="clearfix one_col" data-question="hotel_comfort"> ..... </li>

Batuhan Gürses
  • 116
  • 1
  • 9