-1

I'm a new bee in programming. For my PhD thesis on tourism management, I am trying to extract some data using Beautifulsoup on the website called whatclinic.

link https://www.whatclinic.com/dentists/turkey/mugla-province/yalikavak/dt-ufuk-kayhan

The given website has some ratings, reviews about clinics that I like to get.

Thanks to HedgeHog, the issue seemed to have been resolved. Then I realized that even though there were 133 comments, it only got 12 comments. I assume the problem is the "see more reviews" button. The page link does not change when I press this button. Therefore, I think I need a code to click this button. And also maybe a loop.

I want to create a data frame which consists of 3 column names (rating star, treatment type and price) for 133 reviews at total.

I've been trying many ways, including reading threads on Stackoverflow.

I hope somebody helps. Thank you. I use jupiter notebook on google colabs. I dont know anything about programming languages.

import requests
from bs4 import BeautifulSoup
import lxml
url="https://www.whatclinic.com/dentists/turkey/mugla-province/yalikavak/dt-ufuk-kayhan"
r=requests.get(url)
soup=BeautifulSoup(r.content,"lxml")
for e in soup.find_all("span",attrs={"property":"ratingValue"}):
    print(e.contents)

for f in soup.find_all("span",attrs={"class":"price"}):
    print(f.contents)

for t in soup.find_all("span",attrs={"class":"name"}):
    print(t.contents)

My expected out put is

['5'] ['4'] ['5'] ['5'] ['5'] ['5'] ['5'] ['4.5'] ['4.5'] ['4.5'] ['4.5'] ['₺15735'] ['₺18512'] ['₺9256'] ['₺12033'] ['₺2036'] ['₺3702'] ['₺740'] ['₺926'] ['₺1666'] ['₺2962'] ['\xa0'] ['₺2036'] ['₺3702'] ['₺2036'] ['₺3702'] ['₺4628'] ['₺15735'] ['₺18512'] ['\xa0'] ['\xa0'] ['\xa0'] ['₺9256'] ['₺12033'] ['₺9256'] ['₺9256'] ['₺740'] ['₺1851'] ['₺926'] ['₺1481'] ['₺1666'] ['₺926'] ['\xa0'] ['₺2221'] ['₺2962'] ['\xa0'] ['\xa0'] ['₺5554'] ['Dentist Consultation'] ['Dental Implants'] ['Veneers'] ['Teeth Whitening'] ['Dentures'] ['Dental Crowns'] ['Extractions'] ['Teeth Cleaning'] ['Root canals'] ['Laser Teeth Whitening'] [ ]

HedgeHog
  • 22,146
  • 4
  • 14
  • 36
  • 2
    Does this answer your question? [Web-scraping JavaScript page with Python](https://stackoverflow.com/questions/8049520/web-scraping-javascript-page-with-python) – tripleee Nov 04 '22 at 07:57

1 Answers1

0

Just to point in a direction, you have to call the ajax request, that provides 10 additional reviews based on an index:

f'https://www.whatclinic.com/Consumer/Ajax/AjaxReviewsAndFeedbacks.aspx?clinicid=15797&pid=0&index={i}&showAll=False'

Example

import requests
import pandas as pd
from bs4 import BeautifulSoup

i = 0
data = []

while True:
    url = f'https://www.whatclinic.com/Consumer/Ajax/AjaxReviewsAndFeedbacks.aspx?clinicid=15797&pid=0&index={i}&showAll=False'
    r=requests.get(url)
    soup=BeautifulSoup(r.content)
    
    if len(soup.select('div[id^="review_"]')) > 0:
        i = i+10
        
        for e in soup.select('div[id^="review_"]'):
            data.append({
                'ratingStar': e.select_one('[property="ratingValue"]').text if e.select_one('[property="ratingValue"]') else None,
                'treatmentType':  e.select_one('[property="itemReviewed"]').text if e.select_one('[property="itemReviewed"]') else None,
                'price': e.select_one('[property="itemReviewed"]').next_sibling if e.select_one('[property="itemReviewed"]') else None
            })

    else:
        break

pd.DataFrame(data)
HedgeHog
  • 22,146
  • 4
  • 14
  • 36