0

My goal is grab all of the court case numbers and put them into an Excel folder. The cases are in the 2nd column

My code:

courtCases = driver.find_elements_by_css_selector('body > table:nth-child(3) > tbody > tr:nth-child* > td:nth-child(2)')
for courtCase in courtCases:
    print(courtCase.text)

This throws an error

selenium.common.exceptions.InvalidSelectorException: Message: invalid selector: An invalid or illegal selector was specified.

I was able to get a court case by putting the exact css path and xpath like:

courtCases = driver.find_elements_by_css_selector('body > table:nth-child(3) > tbody > tr:nth-child(7) > td:nth-child(2) > font')

I need to gather all of the court in the 2nd column td:nth-child(2).

Anyhow my question is: can anyone help me write a good css-selector or xpath to get all the court dates?

Some of the html

<html>
<head>
<title>Wejis - Dayton Municipal Court</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
</head>

<body>
<table width="750" border="0">
  <tr>
    <td width="185"><font color="#003399" size="2" face="Verdana, Arial, Helvetica, sans-serif">Run 
      Date: 10/16/2017 </font></td>
    <td width="380"><div align="center">
        <p><strong><font color="#003399" size="4" face="Verdana, Arial, Helvetica, sans-serif">Housing 
          Docket Report</font></strong></p>
        <p><strong><font color="#003399" size="2" face="Verdana, Arial, Helvetica, sans-serif">Dayton 
          Municipal Court</font></strong></p>
      </div></td>
    <td width="185"><div align="right"><font color="#003399" size="2" face="Verdana, Arial, Helvetica, sans-serif">Run 
       Time: 12:28 PM</font></div></td>
  </tr>
</table>
<table width="750">
    <tr><td colspan="4">&nbsp;</td></tr>
    <tr>
        <td width="250"><strong><font color="#003399" size="2" face="Verdana, Arial, Helvetica, sans-serif">Court Date: September 20, 2017</font></strong></td>
        <td width="140"><strong><font color="#003399" size="2" face="Verdana, Arial, Helvetica, sans-serif">All Sessions</font></strong></td>
        <td width="130"><div align="center"><strong><font color="#003399" size="2" face="Verdana, Arial, Helvetica, sans-serif">Courtroom 3A</font></strong></div></td>
        <td width="220"><div align="right"><strong><font color="#003399" size="2" face="Verdana, Arial, Helvetica, sans-serif">Judge Deirdre E Logan</font></strong></div></td>
    </tr>
</table>
<table width="750" border="0">
  <tr> 
    <td colspan="5"><hr></td>
  </tr>

            <tr> 
                <td colspan="2"><font color="#003399" size="2" face="Verdana, Arial, Helvetica, sans-serif"><strong>Housing Trial</strong></font></td>
                <td colspan="3"><font color="#003399" size="2" face="Verdana, Arial, Helvetica, sans-serif"><strong> 8:30AM </strong></font></td>
            </tr>
            <tr> 
                <td width="140"><strong><font color="#003399" size="1" face="Verdana, Arial, Helvetica, sans-serif">Defendant Name</font></strong></td>
                <td width="120"><strong><font color="#003399" size="1" face="Verdana, Arial, Helvetica, sans-serif">Case Number</font></strong></td>
                <td width="240"><strong><font color="#003399" size="1" face="Verdana, Arial, Helvetica, sans-serif">Charges</font></strong></td>
                <td width="115"><strong><font color="#003399" size="1" face="Verdana, Arial, Helvetica, sans-serif">Attorney</font></strong></td>
                <td width="115"><strong><font color="#003399" size="1" face="Verdana, Arial, Helvetica, sans-serif">Location</font></strong></td>
            </tr>

                <tr> 
                    <td valign="top"><font color="#003399" size="1" face="Verdana, Arial, Helvetica, sans-serif">Rosal, Jorge</font></td>
                    <td valign="top"><font color="#003399" size="1" face="Verdana, Arial, Helvetica, sans-serif">2017-CRM-005695</font></td>
                    <td valign="top"><font color="#003399" size="1" face="Verdana, Arial, Helvetica, sans-serif">MAINTAINING EXTERIOR<br></font></td>
                    <td valign="top"><font color="#003399" size="1" face="Verdana, Arial, Helvetica, sans-serif"></font></td>
                    <td valign="top"><font color="#003399" size="1" face="Verdana, Arial, Helvetica, sans-serif">1347  Kingsley </font></td>
                </tr>

            <tr> 
                <td colspan="2"><font color="#003399" size="2" face="Verdana, Arial, Helvetica, sans-serif"><strong>Criminal Court Trial In Jail</strong></font></td>
                <td colspan="3"><font color="#003399" size="2" face="Verdana, Arial, Helvetica, sans-serif"><strong> 9:30AM </strong></font></td>
            </tr>
            <tr> 
                <td width="140"><strong><font color="#003399" size="1" face="Verdana, Arial, Helvetica, sans-serif">Defendant Name</font></strong></td>
                <td width="120"><strong><font color="#003399" size="1" face="Verdana, Arial, Helvetica, sans-serif">Case Number</font></strong></td>
                <td width="240"><strong><font color="#003399" size="1" face="Verdana, Arial, Helvetica, sans-serif">Charges</font></strong></td>
                <td width="115"><strong><font color="#003399" size="1" face="Verdana, Arial, Helvetica, sans-serif">Attorney</font></strong></td>
                <td width="115"><strong><font color="#003399" size="1" face="Verdana, Arial, Helvetica, sans-serif">Location</font></strong></td>
            </tr>

                <tr> 
                    <td valign="top"><font color="#003399" size="1" face="Verdana, Arial, Helvetica, sans-serif">Joyner, Melissa</font></td>
                    <td valign="top"><font color="#003399" size="1" face="Verdana, Arial, Helvetica, sans-serif">2017-CRB-000784</font></td>
                    <td valign="top"><font color="#003399" size="1" face="Verdana, Arial, Helvetica, sans-serif">DRUG ABUSE INSTRUMENT<br>DRUG PARAPHERNALIA/USE OR POSS<br></font></td>
                    <td valign="top"><font color="#003399" size="1" face="Verdana, Arial, Helvetica, sans-serif">Jenn A. Cunningham-Minnick</font></td>
                    <td valign="top"><font color="#003399" size="1" face="Verdana, Arial, Helvetica, sans-serif">1401  Harshman RD</font></td>
                </tr>

                <tr> 
                    <td valign="top"><font color="#003399" size="1" face="Verdana, Arial, Helvetica, sans-serif">Joyner, Melissa</font></td>
                    <td valign="top"><font color="#003399" size="1" face="Verdana, Arial, Helvetica, sans-serif">2017-CRM-000775</font></td>
                    <td valign="top"><font color="#003399" size="1" face="Verdana, Arial, Helvetica, sans-serif">LITTERING IN PARK<br></font></td>
                    <td valign="top"><font color="#003399" size="1" face="Verdana, Arial, Helvetica, sans-serif"></font></td>
                    <td valign="top"><font color="#003399" size="1" face="Verdana, Arial, Helvetica, sans-serif">1401  Harshman RD</font></td>
                </tr>
marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
John
  • 1
  • 2
  • Did you try using BeautifulSoup? A related question is here: https://stackoverflow.com/questions/23377533/python-beautifulsoup-parsing-table. You can write the data to a pandas dataframe and then write out a csv using pandas.to_csv – skrubber Oct 16 '17 at 17:19
  • Yes, but I think the OP is favoring XPath expressions which, I believe, BeautifulSoup doesn't support – Mangohero1 Oct 16 '17 at 17:22
  • I tried xlml and had a similar problem to this: import requests url = raw_input("Enter a website to extract the URL's from: ") r = requests.get("http://" +url) data = r.text soup = BeautifulSoup(data) I cannot get the r.text as in the example because to get: url = raw_input("Enter a website to extract the URL's from: ") r = requests.get("http://" +url) i need the url and i can only get to the webpage by click on a link. If reloading the webpage a 404 error occurs--file or directory not found. I need to get the data from my open webpage – John Oct 16 '17 at 17:24
  • to reproduce the page I'm getting the data go to http://www.wejis.com/pa/qhousingdocket.cfm and click from the drop down menu 9/20/2017 and then click generate report. – John Oct 16 '17 at 17:32

2 Answers2

1

Found it through an XPath:

courtCases = driver.find_elements_by_xpath('//td[2]/font[@size="1"]')
for courtCase in courtCases:
    print(courtCase.text)

Notice how all court cases are on font size 1. If you were to leave out this attribute, you'd get the time as well.

Mangohero1
  • 1,832
  • 2
  • 12
  • 20
  • wow!! thank you. I need to work on my xpath skills. I tried a lot of different stuff. I could get all of the table which had the court cases in it using '/html/body/table[3]/*' and tried to narrow it down from there but was having no luck so tried to switch to css selector. – John Oct 16 '17 at 17:54
  • No worries. It's a *lot* easier finding it if you know how to use Chrome's Inspector tool. :-) – Mangohero1 Oct 16 '17 at 18:01
1

You've got a little more in your selector than you need. I've found that it can be reduced to the below.

td:nth-child(2) > font[size='1']

CSS selectors are faster and better supported than XPath but there are some things, like locating an element by the contained text, that only XPath can do.

JeffC
  • 22,180
  • 5
  • 32
  • 55
  • Interesting approach, and thanks for that insight. I figured CSS would be faster, I just wasn't sure on how it was constructed. +1 – Mangohero1 Oct 18 '17 at 22:54