2

I'm trying to scrape this afghanistan page by extracting the cities and area codes in the table. Now, When I try to scrape this american-samoa page, findAll() cannot find <td> which is true. How to catch this exception?

Here's my code:

from bs4 import BeautifulSoup                                                                                                                                                                                                                
import urllib2                                                                                                                                                                                                                               
import re                                                                                                                                                                                                                                    

url = "http://www.howtocallabroad.com/american-samoa"
html_page = urllib2.urlopen(url)
soup = BeautifulSoup(html_page)

areatable = soup.find('table',{'id':'codes'})
d = {}

def chunks(l, n):
    return [l[i:i+n] for i in range(0, len(l), n)]

li = dict(chunks([i.text for i in areatable.findAll('td')], 2))
if li != []:
    print li

    for key in li:
            print key, ":", li[key]
else:
    print "list is empty"

This is the error i got

Traceback (most recent call last):
  File "extract_table.py", line 15, in <module>
    li = dict(chunks([i.text for i in areatable.findAll('td')], 2))
AttributeError: 'NoneType' object has no attribute 'findAll'

I also tried this but doesn't work too

def gettdtag(tag):
    return "empty" if areatable.findAll(tag) is None else tag

all_td = gettdtag('td')
print all_td
chip
  • 1,779
  • 1
  • 22
  • 34

1 Answers1

2

The error says that areatable is None:

areatable = soup.find('table',{'id':'codes'})
#areatable = soup.find('table', id='codes')  # Also works

if areatable is None:
    print 'Something happened'
    # Exit out

Also, I'd use find_all instead of findAll and get_text() instead of text.

Blender
  • 289,723
  • 53
  • 439
  • 496
  • 1
    Dammit, I almost wrote the same answer. – Achrome Jun 06 '13 at 06:07
  • In regards to using some functions instead of others, [that would](http://stackoverflow.com/a/16932131/1971805) be [my fault](http://stackoverflow.com/a/16916879/1971805) :P – TerryA Jun 06 '13 at 06:18
  • @Haidro that would be my next todo, knowing the diff between `find_all` instead `findAll` learning python one step at a time =) – chip Jun 06 '13 at 06:23
  • 1
    @zipc I think it's just a matter of `findAll` is part of version 3 of BeautifulSoup, and `find_all` is part of version 4. But `findAll` still works in 4. Take a look [here](http://www.crummy.com/software/BeautifulSoup/bs4/doc/#method-names) – TerryA Jun 06 '13 at 06:25