python mechanize check dates/time for an exam from a website

Question

I am trying to check the dates/times availability for an exam using Python mechanize and send someone an email if a particular date/time becomes available in the result (result page screenshot attached)

import mechanize
from BeautifulSoup import BeautifulSoup
URL = "http://secure.dre.ca.gov/PublicASP/CurrentExams.asp"


br = mechanize.Browser()
response = br.open(URL)


# there are some errors in doctype and hence filtering the page content a bit
response.set_data(response.get_data()[200:])

br.set_response(response)
br.select_form(name="entry_form")

# select Oakland for the 1st set of checkboxes

for i in range(0,     len(br.find_control(type="checkbox",name="cb_examSites").items)):
    if i ==2:
        br.find_control(type="checkbox",name="cb_examSites").items[i].selected =True

# select salesperson for the 2nd set of checkboxes

for i in range(0, len(br.find_control(type="checkbox",name="cb_examTypes").items)):
    if i ==1:
        br.find_control(type="checkbox",name="cb_examTypes").items[i].selected =True

reponse = br.submit()
print  reponse.read()

I am able to get the response but for some reason the data within my table is missing

here are the buttons from the initial html page

<input type="submit" value="Get Exam List" name="B1">
<input type="button" value="Clear" name="B2" onclick="clear_entries()">
<input type="hidden" name="action" value="GO">

one part of the output (submit response) where the actual data is lying

<table summary="California Exams Scheduling" class="General_list" width="100%" cellspacing="0"> <EVERTHING INBETWEEN IS MISSING HERE>
</table>

All the data within the table is missing. I have provided a screenshot of the table element from chrome browser.

Can someone please tell me what could be wrong ?
Can someone please tell me how to get the date/time out of the response (assuming I have to use BeautifulSoup) and so has to be something on these lines. I am trying to find out if a particular date I have in mind (say March 8th) in the response shows up a Begin Time of 1:30 pm..screenshot attached

soup = BeautifulSoup(response.read()) print soup.find(name="table")

update - looks like my issue might be related to this question and am trying my options . I tried the below as per one of the answers but cannot see any tr elements in the data (though can see this in the page source when I check it manually)

soup.findAll('table')[0].findAll('tr')

Update - Modfied this to use selenium, will try and do further at some point soon

from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.common.keys import Keys
from bs4 import BeautifulSoup
import urllib3


myURL = "http://secure.dre.ca.gov/PublicASP/CurrentExams.asp"
browser = webdriver.Firefox() # Get local session of firefox
browser.get(myURL) # Load page

element = browser.find_element_by_id("Checkbox5")
element.click()


element = browser.find_element_by_id("Checkbox13")
element.click()

element = browser.find_element_by_name("B1")
element.click()

The website is probably using JavaScript to render the page which BeautifulSoup doesn't know how to run , you'll need to use something like Selenium to load the page in an actual browser — maxymoo, Mar 07 '16 at 04:54

ce.teuf · Answer 1 · 2021-04-03T09:16:09.133

5 years later, maybe this can help someone. I took your problem as a training exercise. I completed it using the Requests package. (I use python 3.9)

The code below is in two parts:

the request to retrieve the data injected into the table after the POST request.

## the request part

url = "https://secure.dre.ca.gov/PublicASP/CurrentExams.asp"
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:88.0) Gecko/20100101 Firefox/88.0"}

params = {
"cb_examSites": [
    "'Fresno'",
    "'Los+Angeles'",
    "'SF/Oakland'",
    "'Sacramento'",
    "'San+Diego'"
],
"cb_examTypes": [
    "'Broker'",
    "'Salesperson'"
],
"B1": "Get+Exam+List",
"action": "GO"
}

s = rq.Session()
r = s.get(url, headers=headers)
s.headers.update({"Cookie": "%s=%s" % (r.cookies.keys()[0], r.cookies.values()[0])})
r2 = s.post(url=url, data=params)
soup = bs(r2.content, "lxml") # contain data you want

Parsing the response (a lot of ways to do it mine is maybe a bit stuffy)

table = soup.find_all("table", class_="General_list")[0]

titles = [el.text for el in table.find_all("strong")]

def beetweenBr(soupx):
    final_str = []
    for br in soupx.findAll('br'):
    next_s = br.nextSibling
    if not (next_s and isinstance(next_s,NavigableString)):
        continue
    next2_s = next_s.nextSibling
    if next2_s and isinstance(next2_s,Tag) and next2_s.name == 'br':
        text = str(next_s).strip()
        if text:
            final_str.append(next_s.strip())
 return "\n".join(final_str)

 d = {}
 trs = table.find_all("tr")
 for tr in trs:
      tr_text = tr.text
      if tr_text in titles:
           curr_title = tr_text
           splitx = curr_title.split(" - ")
           area, job = splitx[0].split(" ")[0], splitx[1].split(" ")[0]
      if not job in d.keys():
           d[job] = {}
      if not area in d[job].keys():
           d[job][area] = []
      continue
      if (not tr_text in titles) & (tr_text != "DateBegin TimeLocationScheduledCapacity"):
           tds = tr.find_all("td")
           sub = []
           for itd, td in enumerate(tds):
               if itd == 2:
                   sub.append(beetweenBr(td))
               else :
                   sub.append(td.text)
          d[job][area].append(sub)

"d" contain data u want. I didn't go as far as sending an email yet.

Thank you for trying it out and adding the answer here @ce.teuf — Naresh MG, Jun 11 '22 at 21:30

python mechanize check dates/time for an exam from a website

1 Answers1