0

I am a journalism student and am totally new to the world of Python. Right now, I am trying to convert the table on this site into a csv so I can add it to my database. Through lots of troubleshooting and some YouTube tutorials, I have come up with this:

import csv
import urllib.request
from bs4 import BeautifulSoup
f = open('dataoutput.csv', 'w', newline = '')
writer = csv.writer(f)
soup = BeautifulSoup(urllib.request.urlopen("https://www.townofchapelhill.org/town-hall/departments-services/planning-and-sustainability/gis-analytics/development-activity-report").read(), 'lxml')
tbody = soup('table', {"class":"tableData tablesorter tablesorter-blue hasFilters hasStickyHeaders"}) [0].find_all('tr')
for row in tbody: 
    cols = row.findChildren(recursive=False)
    cols = [ele.text.strip() for ele in cols]
    writer.writerow(cols)
    print(cols)
f.close()

Right now, the code returns a csv, but it is empty. In the Mac OSX terminal, I get the following error:

as9934-pc:pythonstuff as9934$ python3 ./make.py
Traceback (most recent call last):
  File "./make.py", line 8, in <module>
    tbody = soup('table', {"class":"tableData tablesorter tablesorter-blue hasFilters hasStickyHeaders"}) [0].find_all('tr')
IndexError: list index out of range

The only number I specify is [0] so I'm confused rn.

Any thoughts?

Joel
  • 1,564
  • 7
  • 12
  • 20
as9934
  • 33
  • 3
  • can you post the docs for the particular beautiful soup functions you are trying to use / rely on? – NotAnAmbiTurner Oct 30 '18 at 02:10
  • Not sure I totally understood your question. The tutorial I used was this: (https://www.youtube.com/watch?v=OF8X47olcpg). Beautiful Soup's documentation is here: (https://www.crummy.com/software/BeautifulSoup/bs4/doc/) – as9934 Oct 30 '18 at 02:19
  • If you examine the contents of `soup` I think you will find that it does not include the table. I suggest you use a library such as Selenium (in other words a 'headless' browser) to load and manipulate that page, so that the page is able to execute the code within it that loads the table you want. – Bill Bell Oct 30 '18 at 03:49

3 Answers3

0

If the zeroeth index of a list doesn't exist, then the list must have no elements (that is, it is an empty list). So, soup('table', {"class":"tableData tablesorter tablesorter-blue hasFilters hasStickyHeaders"}) returns an empty list. You can confirm this by seeing what len(soup('table', {"class":"tableData tablesorter tablesorter-blue hasFilters hasStickyHeaders"})) returns.

Joel
  • 1,564
  • 7
  • 12
  • 20
  • It's definitely not empty: (https://www.townofchapelhill.org/town-hall/departments-services/planning-and-sustainability/gis-analytics/development-activity-report). Should I be grabbing from the t-body? @Joel – as9934 Oct 30 '18 at 02:13
  • Perhaps your code is not grabbing the data from the website correctly, thus returning an empty list. What does `soup('table', {"class":"tableData tablesorter tablesorter-blue hasFilters hasStickyHeaders"})` return? – Joel Oct 30 '18 at 02:45
0

The website is using an iFrame that you must source from in order to access the table. Use this (found in <iframe src ...>) as your link instead:

https://gis.townofchapelhill.org/developments/report/report.aspx

Along with:

tbody = soup.findAll('table')

And you will get the rows.

user1394
  • 538
  • 1
  • 6
  • 17
  • This is super helpful! Unfortunately, now it only returns 'id' and 'Active' as two cells in the csv even though the terminal prints everything. – as9934 Oct 30 '18 at 06:35
  • @as9934 best I can give you for now is to add this at the end `for col in cols: f.write(col)` -- writes to rows in the csv. Maybe an upvote or an answer checkmark? :-) – user1394 Oct 30 '18 at 06:50
  • Pretty darn close. Exports the whole table but into only one column separated by 3 blank rows with a few random bits of data in a second column ie. "blvd" and "205 Friendly Lane". Probably workable at this point I just want to know for my own curiosity now. – as9934 Oct 30 '18 at 07:24
0

The page has java scripts in them. So, the complete table data is in a script. Try adding this to your code.

with open("test.html", "w") as file:
   file.write(str(soup))
  1. Open the test.html in a browser.
  2. Open the same file in a Text Editor. The difference will be seen. Table contents are not visible in the text editor but you can see the table in the browser.

There are multiple solutions for this. Check this link for simple solutions

Sujith Royal
  • 762
  • 10
  • 9