Web scraping in Python to pull data from underlying tables

Question

I am using Python 3.4 and I'm trying to scrape the underlying data from the below link and dump into a .csv file.

I am currently using BeautifulSoup and the first few lines in my script look as below:

import urllib
from urllib.request import urlopen
from bs4 import BeautifulSoup
htmlfile=urlopen("https://secure.moneygram.com/estimate")
soup=BeautifulSoup(htmlfile)
print (soup.prettify()[0:1000])

Can anybody provide me some help?

Thanks

score 0 · Answer 1 · answered Mar 07 '16 at 09:30

If you need to log in you will need to use splinter(Browser), If you don't need it and your data is clear you can extract data form the html code using find, findNext, findAll, find_by_name, find_by_id,find_by_css ... example :

    soop = htmltext.find('table',{"id":"noticeResults"}).findNext('tbody')

This code provide the data in the table (tbody) with the id "noticeResults".

score 0 · Answer 2 · edited May 23 '17 at 12:31

0

You should take a look to this python BeautifulSoup parsing table

and then to save as csv:

data = [...] # your data coming from BS4
import csv
with open('csv_file.csv', 'wb') as csvfile:
    writer = csv.writer(csvfile)
    for row in data
        writer.writerow(row)

edited May 23 '17 at 12:31

Community

1
1

answered Mar 07 '16 at 10:11

MajorTom

327
2
5

Modified my code as below, but getting the error "AttributeError: 'NoneType' object has no attribute 'find_all'. The beautifulsoup object doesnt seem to have any tabular data. Any other alternate way to grab and parse? – Abacus Mar 07 '16 at 18:10
table = soup.find("table", { "class" : "lineItemsTable" }) for row in table.find_all("tr"): cells = row.find_all("td") col1 = cells[1].find(text=True) col2 = cells[2].find(text=True) print (col1,col2) – Abacus Mar 07 '16 at 18:11
As Ch.Hedi said, it seems that you have to log in to get the desired data. At this point I can't help you more since I can't log in on this site and I don't have a sample of the page you want to scrap. P.S. in the example I linked you, "lineItemsTable" is the name of the css class of the table you want to extract, you have to replace this by the your value. – MajorTom Mar 08 '16 at 21:45

Web scraping in Python to pull data from underlying tables

2 Answers2