0

I tried to scrape a text from following page source:

enter image description here

I used selenium and python to scrape "Diese Termine stehen zu ...".

What have I tried so far?

  1. Using xpath to find elements and use the absolute location:

availability = driver.find_elements_by_xpath("//*[@id='booking-content']/div[2]/div[4]/div/div[2]/div/div/div/div[1]/div/div/span")

  1. using class name:

elements = driver.find_elements_by_class_name("dl-text dl-text-body dl-text-regular dl-text-s dl-text-color-inherit")

  1. using css selector:

use following keyword: .booking-message .dl-text

availability = driver.find_element_by_css_selector('.booking-message .dl-text')

All of above didn't work. With step 3, I am sure, it should work, because as can be seen in screenshot, I could find the element using the same keyword in Chrome. But still no luck.

The error message is:

Traceback (most recent call last):
  File "/Users/GunardiLin/Desktop/Codes/Tracker.py", line 18, in <module>
    availability = driver.find_element_by_css_selector('.booking-message .dl-text')
  File "/Users/GunardiLin/opt/anaconda3/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py", line 598, in find_element_by_css_selector
    return self.find_element(by=By.CSS_SELECTOR, value=css_selector)
  File "/Users/GunardiLin/opt/anaconda3/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py", line 978, in find_element
    'value': value})['value']
  File "/Users/GunardiLin/opt/anaconda3/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py", line 321, in execute
    self.error_handler.check_response(response)
  File "/Users/GunardiLin/opt/anaconda3/lib/python3.7/site-packages/selenium/webdriver/remote/errorhandler.py", line 242, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"css selector","selector":".booking-message .dl-text"}
  (Session info: chrome=90.0.4430.212)

I am aware of another posting with the same problem: Python with selenium: unable to locate element which really exist

Which is why I checked if the site is using "iframe". I checked it by searching for "iframe-tags" just like in the screenshot. The search result is 0, which means nothing is found.

Could someone give a pointer how to scrape the text? I prefer to use the css selector (option 3) and dislike using option 1 (xpath + absolute location). But currently I would be thankful with any solution.

Thank you in advance:-)

Update:

My code so far:

import os
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import Select

url = r"https://www.doctolib.de/gemeinschaftspraxis/muenchen/fuchs-hierl?practitioner_id=any&speciality_id=5593&utm_campaign=website-button&utm_source=fuchs-hierl-website-button&utm_medium=referral&utm_content=custom&utm_term=fuchs-hierl"

chrome_options = Options()
chrome_options.add_argument('--headless')
driver = webdriver.Chrome(executable_path="/Applications/chromedriver", options=chrome_options)
driver.get(url)
print('*** Title:', driver.title)
# print(driver.page_source.encode("utf-8"))
dropdown_besuchgrund = driver.find_element_by_id("booking_motive")
select_besuchgrund = Select(dropdown_besuchgrund)
# print(dir(select_besuchgrund))
select_besuchgrund.select_by_visible_text("Erste Impfung Covid-19 (BioNTech-Pfizer)")
# availability = driver.find_elements_by_xpath("//*[@id='booking-content']/div[2]/div[4]/div/div[2]/div/div/div/div[1]/div/div/span")
#elements = driver.find_elements_by_class_name("dl-text dl-text-body dl-text-regular dl-text-s dl-text-color-inherit")
# availability = driver.find_element_by_css_selector('.booking-message .dl-text')
availability = driver.find_element_by_xpath(".//div[contains(@class,'booking-message')]/span")
print("***")
print(availability.text)
# for elem in elements:
#     print ("***", elem.text)
#     if elem.text == "Diese Termine stehen zu einem späteren Zeitpunkt wieder für eine Online-Buchung zur Verfügung. ":
#         print("*** Ausgebucht")
driver.quit()

@itronic1990 22.05.2021 07:45: I have checked your suggestion with:

driver.find_element_by_xpath(".//div[contains(@class,'booking-message')]/span").text

enter image description here

As you can see above, chrome can find the text with your filter. But if I run the code, it can't find it. My test code:

import os
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
url = r"https://www.doctolib.de/gemeinschaftspraxis/muenchen/fuchs-hierl"
chrome_options = Options()
chrome_options.add_argument('--headless')
driver = webdriver.Chrome(executable_path="/Applications/chromedriver", options=chrome_options)
driver.get(url)
element_text = driver.find_element_by_xpath(".//div[contains(@class,'booking-message')]/span").text
print(element_text)
driver.quit()

Error Message:

NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":".//div[contains(@class,'booking-message')]/span"}
  (Session info: headless chrome=90.0.4430.212)

I can't understand how could why? Thank you for any advice.

gunardilin
  • 349
  • 3
  • 12
  • 1
    Maybe you missing some wait /delay before applying the `driver.find_element_by_css_selector('.booking-message .dl-text')`? – Prophet May 21 '21 at 13:26
  • 1
    Can you share the link to that web page? – Prophet May 21 '21 at 13:27
  • url = r"https://www.doctolib.de/gemeinschaftspraxis/muenchen/fuchs-hierl?practitioner_id=any&speciality_id=5593&utm_campaign=website-button&utm_source=fuchs-hierl-website-button&utm_medium=referral&utm_content=custom&utm_term=fuchs-hierl" – gunardilin May 21 '21 at 14:47
  • Hey Prophet, I have attached the link above. Thanks in advance – gunardilin May 21 '21 at 14:48
  • Hello Prophet, what do you mean with wait/delay? What is the reason for that? – gunardilin May 21 '21 at 14:53
  • 1
    @gunardilin what are you exactly trying to get? What ius your expected output? – chitown88 May 21 '21 at 15:38
  • 1
    @gunardilin I opened that link. I couldn't see any element there matching `.booking-message .dl-text` locator. I do see element located by `.booking-message` but there is nothing inside it. – Prophet May 21 '21 at 15:40
  • 1
    By wait / delay I mean to put some expected condition waiting for some condition, for example to the element to be visible etc. But still I can't see this element at all I'm not sure it's relevant. However, it's possible that the site presents different data for different locations so it displays me not what you see there – Prophet May 21 '21 at 15:43
  • @chitown88 I am trying to scrape if a vaccination slot becomes available. If it is available, the script should print out something or notify me. – gunardilin May 21 '21 at 15:54
  • @Prophet: After opening the website, you have to choose the 2nd dropdown at the right side to choose the vaccination that you would like to get. I think the reason why you don't find the mentioned elements is because you haven't select the shot you would like to get. – gunardilin May 21 '21 at 15:54
  • 1
    @Prophet After 2nd time reading your post, I think you might be right, that your location does affect what you could see.... Hhhmmm... – gunardilin May 21 '21 at 15:55
  • 1
    @gunardilin, are you only interested in the Pfzier? Or do you want to check the others too? Also, is it just for this location? – chitown88 May 21 '21 at 15:59
  • @chitown88 I am interested in Pfizer, because I have already got one Astra appointment for next month. For this location first. Just testing my luck here... – gunardilin May 21 '21 at 16:06

2 Answers2

1

You have used find_elements in xpath and by classname. Is that right?

Try this

driver.find_element_by_xpath(".//div[contains(@class,'booking-message')]/span").text
itronic1990
  • 1,231
  • 2
  • 4
  • 18
  • I have included all my codes/attempts in my original posting. I tried your suggestion and it still doesn't work. It puzzles me, why couldn't it work... – gunardilin May 21 '21 at 14:52
  • 1
    can you try the xpath in the developer console and see if it returns any element? – itronic1990 May 21 '21 at 15:08
  • Hey itronic1990, I have tried your advice. Still not working. I have updated my original post to answer your question. Basically developer console can find it using the filter but the script doesn't... Thank you for further assistance :-) – gunardilin May 22 '21 at 05:56
1

Why bother with Selenium? Fetch the data straight from the source:

import requests

url = 'https://www.doctolib.de/availabilities.json'
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.212 Safari/537.36'}
payload = {
'start_date': '2021-05-21',
'visit_motive_ids': '2820334',
'agenda_ids': '466608',
'insurance_sector': 'public',
'practice_ids': '25230',
'limit': '4'}

jsonData = requests.get(url, headers=headers, params=payload).json()

Output:

print(jsonData['message'])
Diese Termine stehen zu einem späteren Zeitpunkt wieder für eine Online-Buchung zur Verfügung. 

I'm not familiar with German, otherwise I might be able to make this more efficient. Put basically use the practice_id to feed into it and get the data from each practice.

import requests
from bs4 import BeautifulSoup
from datetime import datetime

# Get location practice_ids
url = 'https://www.doctolib.de/allgemeinmedizin/81667-muenchen'
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.212 Safari/537.36'}

practice_ids_list = []
for page in range(1,100):
    payload = {'page':page}

    response = requests.get(url, headers=headers, params=payload)
    if response.status_code == 404:
        break
    
    else:
        print('Page: %s' %page)
        soup = BeautifulSoup(response.text, 'html.parser')
        divs = soup.find_all('div',{'class':'dl-search-result'})
        
        for div in divs:
            practice_id = div['id'].split('-')[-1]
            practice_ids_list.append(practice_id)

today = datetime.today().strftime('%Y-%m-%d')

url = 'https://www.doctolib.de/availabilities.json'
for practice_id in practice_ids_list:
    payload = {
    'start_date': today,
    'visit_motive_ids': '2820334',
    'agenda_ids': '466606',
    'insurance_sector': 'public',
    'practice_ids': '%s' %practice_id,
    'limit': '15'}
    
    jsonData = requests.get(url, headers=headers, params=payload).json()
    
   
    if jsonData['total'] == 0 and 'next_slot' not in jsonData.keys():
        #print('\t', jsonData['message'],'\n')
        print(practice_id)
    else:
        # Get Clinic Details
        clinic_url = 'https://www.doctolib.de/search_results/%s.json' %practice_id
        clinic_jsonData = requests.get(clinic_url, headers=headers).json()
        clinic_name = clinic_jsonData['search_result']['name_with_title']
        address = clinic_jsonData['search_result']['address']
        city = clinic_jsonData['search_result']['city']
        zipcode = clinic_jsonData['search_result']['zipcode']
        print('%s\n%s %s %s' %(clinic_name, address, city, zipcode))
        
        payload.update({'start_date':jsonData['next_slot']})
        jsonData = requests.get(url, headers=headers, params=payload).json()
        print('\n\t','*'*50,'\nThe follow dates are available:')
        for each_date in jsonData['availabilities']:
            if len(each_date['slots']) > 0:
                print('\t\t',each_date['date'])
chitown88
  • 27,527
  • 4
  • 30
  • 59
  • Wow, that looks promising. pardon me, I am a beginner programmer. What is the reason for using normal request? When are beautifulsoup or selenium better? I didn't think of using request directly. Thank you for your answer:-) – gunardilin May 21 '21 at 16:08
  • Why were you asking if I am interested only for this one location? Do you mean that I could also run the same script for another location? Thank you for clarification – gunardilin May 21 '21 at 16:09
  • I will check your code latter and thump up the code. Thanks again – gunardilin May 21 '21 at 16:10
  • 1
    @gunardilin. Yes, you can alter the parameters here to look at different vaccines and different locations. It's just a matter of figuring out those id/codes and then you can have the script look through all those locations checking for a date. I'll show you what I mean (I'l adjust the code above). If the data can be acquired straight form an api or returned as a json format, then use a simple requests (no need to parse it from the html. If you need to get that data out of html source code, then use requests to get the html, then BS to parse the data from that. all else fails, use Selenium. – chitown88 May 21 '21 at 16:16
  • I have checked your first code and having questions. I opened https://www.doctolib.de/availabilities.json?start_date=2021-05-22&visit_motive_ids=2820334&agenda_ids=466608&insurance_sector=public&practice_ids=25230&limit=4, which is the same url+params that your code used. 1. How did you even know that you can search with "...www.doctolib.de/availabilities.json..." in the first place? 2. How did you find out what params (listed in payload dictionary) the url needs? e.g. agenda ids, etc. 3. How did you find out what "466608" for "agenda_ids" means? Thanks for your assistance in advance:-) – gunardilin May 22 '21 at 05:21
  • 4. What is the equivalent normal website with GUI for the url+params in the 1st code? I am trying to find out, if the url+params is ONLY checking availability for "start_date" or availability STARTING FROM "start_date"? Pardon me, for asking that much questions. Thank you in advance. – gunardilin May 22 '21 at 05:29
  • 1
    Go to the dev tools (shift-ctrl-i). Under network -> XHR tabs you can see it (you might need to refresh the page). As far as the parameters, just did trial and error (Ie, with dev tools open, click change something on the site, and look at the XHR and note what you clicked, and what changed) – chitown88 May 22 '21 at 05:37
  • I didn't know that dev tools can do that. Woww... Could you recommend me any online course or good website to learn web scraping? I used the dev tools only to find elements until now and didn't know it can do more. Much appreciated. – gunardilin May 22 '21 at 06:14
  • 1
    Honestly I never did an online tutorial or coarse for web scraping. Just practiced and looked at SO for different ways to do it. Just sort of learned by trial and error. So sorry don’t have any recommendations. I’ll look around though and see if there’s any that if I went back, I wished I would had done. – chitown88 May 22 '21 at 08:07