1

I am currently working on a Python project in which the script visits a website (https://service.berlin.de/dienstleistung/120686/), clicks the link "Termin berlinweit suchen und buchen", then keep refreshing the page (after a specified time) until there is a change on the webpage. The change on the website is detected by comparing the hash values before and after the refresh. If there has been a change, I should receive an email. The problem is that there have been clear changes to the site, but I do not receive an email. The code is a working example.

I have tried:

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
import time, hashlib, smtplib, ssl, requests

driver = webdriver.Firefox(executable_path=r'C:\Users\Me\AppData\Local\Programs\Python\Python37\geckodriver.exe')  # Loads Geckodriver.exe
driver.get("https://service.berlin.de/dienstleistung/120686/")  # Loads initial page

appointmentPageLink = WebDriverWait(driver, 5).until(EC.element_to_be_clickable((By.XPATH, "/html[1]/body[1]/div[1]/div[2]/div[1]/div[1]/div[1]/div[4]/div[3]/div[1]/div[2]/div[9]/div[1]/p[1]/a[1]")))
driver.execute_script("arguments[0].click();", appointmentPageLink)  # Clicks the link for appointments

while True:
        currentHash = hashlib.sha256(driver.page_source).hexdigest() # Get hash
        time.sleep(100) # Wait
        driver.refresh() # Refresh page
        newHash = hashlib.sha256(driver.page_source).hexdigest() # Get new hash to comapre

        if newHash == currentHash:  # Time to compare hashes!
            continue  # If the hashes are the same, continue
        else: # If the hashes are different, send email
            port = 587  # For starttls
            smtp_server = "smtp.gmail.com"
            sender_email = "OMITTED"  # Enter your address
            receiver_email = "OMITTED"  # Enter receiver address
            password = "OMITTED"  # Enter sender email password
            message = """\
            Subject: New change detected for Anmeldung!

            Visit https://service.berlin.de/dienstleistung/120686/ now!"""  # Add a message

            context = ssl.create_default_context()  # Send the email!
            with smtplib.SMTP(smtp_server, port) as server:
                server.ehlo()  # Can be omitted
                server.starttls(context=context)
                server.ehlo()  # Can be omitted
                server.login(sender_email, password)
                server.sendmail(sender_email, receiver_email, message)
                server.quit()

Error Message:

Traceback (most recent call last):
  File "C:/Users/Me/PycharmProjects/ServiceBerlin/ServiceBEMonitor.py", line 14, in <module>
    currentHash = hashlib.sha256(driver.page_source).hexdigest() # Get hash
UnicodeEncodeError: 'ascii' codec can't encode character u'\xfc' in position 2780: ordinal not in range(128)
Nick
  • 55
  • 1
  • 7
  • maybe you want to put some print statemments and check if it is entering in the else loop or not... – illusion Dec 06 '20 at 20:05
  • Also in the gmail settings, did you change the "allow low security applications" setting? Or allow insecure applications or something like this... – illusion Dec 06 '20 at 20:06
  • Yes the email settings have been changed, also I will try printing the a few statements but I suspect the error is due to the logic of the code – Nick Dec 06 '20 at 20:08
  • If you suspect so, then i'll also try to write an alternative code... – illusion Dec 06 '20 at 20:10
  • ohhh... i think the problem lies in `requests.get`, and `driver.refresh`. They have no relation. It refreshes your driver, sure. but the `appointmentPage` is not loaded in your driver, it is not opened in your selenium driver... It is sent using the `requests` module and it has no relation with selenium. So after refreshing, `appointmentPage` still has the old value as the refresh doesnt work on it... I hope you are getting it otherwise I can explain in more detail... – illusion Dec 06 '20 at 20:16
  • I think I understand.. so in order to resolve this I need to change requests.get to something that is more compatible with selenium? Problem is, this is used to get a hash value for currentHash, and I am not sure if getting a hash is something selinium can do – Nick Dec 06 '20 at 20:35
  • first of all, do you desire to stop running the code after a change is detected and a mail is sent? Or you want to keep it running to check for subsequent changes as well? – illusion Dec 06 '20 at 20:42
  • I intend to make it so it detects subsequent changes as well, yes – Nick Dec 06 '20 at 20:52

1 Answers1

0

In one iteration of your while loop, you send a get request to your desired URL by using requests (which has no relation to selenium) and store it in appointmentPage, then you calculate its hash, then refresh the driver and calculate the hash on the same appointmentPage which is not modified at all since driver.refresh() refreshes your driver and not the appointmentPage which is an HTTP request from requests library. Hence, the currentHash is always equal to your newHash in one iteration. The value of newHash and currentHash probably changes in every iteration, but they are always equal in an iteration of your while loop and hence no mail is sent.

Now to solve your problem, we first need to get the source code of the page inside your driver, then refresh the page and get the source code again and check their respective hashes. So maybe the following code can work:

while True:
  currentHash = hashlib.sha256(driver.page_source).hexdigest()
  time.sleep(100)
  driver.refresh()
  newHash = hashlib.sha256(driver.page_source).hexdigest()
  if newHash == currentHash:  # Time to compare hashes!
    continue  # If the hashes are different, send email
  else:
    #send mail
illusion
  • 1,272
  • 11
  • 22
  • Hey that's a pretty good idea.. I am going to let the script run for a while and see if it works and get back to you! – Nick Dec 06 '20 at 21:17
  • cool! Actually, I haven't used selenium much... so I hope you get the idea and probably, you can try a few other approaches around the same idea... let me know if this was the actual problem... otherwise we'll have to debug more... Also make sure the mailing works by testing just the mailing code... – illusion Dec 06 '20 at 21:19
  • Sure will do, I got an error with the new code, I have also updated the code in the question accordingly. I am about to explore the error and look for possible solutions – Nick Dec 06 '20 at 21:40
  • This is an encoding error.. check this: https://stackoverflow.com/questions/16823086/selenium-webdriver-and-unicode – illusion Dec 06 '20 at 21:42
  • Going to have a look later or tomorrow because I am pretty tired right now :) – Nick Dec 06 '20 at 22:42