BeautifulSoup a.['href'] exception handling

Question

I am using bs4 and scraping links, in some cases in the loop there is no <a href=""> tag.

So in that case I want to input the text attribute of the element.

my sample code,

base_url = "http://example.com"

abc = base_url + str(tds.a['href']) if tds.a['href'] else tds.text

Exception is thrown

TypeError: 'NoneType' object is not subscriptable

This is how my td element looks like:

<td nowrap=""><font face="Arial" size="1"><a href="view_document?docurl=http://www.envirostor.dtsc.ca.gov/public/deliverable_documents/6382679581/Recorded%20LUC%2010%2D14%2D2010%2Epdf" target="6382679581">[VIEW COVENANT]</a> </font></td>"

How to solve this??

Ps. using Python 3 and Bs4

Presumably here either `tds` or `tds.a` is `None`. You'll want to check that instead `tds.a['href'] if tds.a else 'some default'` — Bailey Parker, Mar 23 '18 at 04:34

score 1 · Answer 1 · answered Mar 23 '18 at 11:00

In Python, it is always EAFP (Easier to Ask for Forgiveness than Permission).

If the a tag doesn't have the href attribute, tds.a['href'] will raise a KeyError.
If the td tag doesn't have the a tag, tds.a['href'] will raise a TypeError as shown in the question.

So, using the EAFP principle:

base_url = "http://example.com"
try:
    abc = base_url + tds.a['href']
except (KeyError, TypeError):
    abc = base_url + tds.text

score 0 · Accepted Answer · answered Mar 23 '18 at 04:37

0

Use the has_attr() method in BS4:

abc = base_url + str(tds.a['href']) if tds.has_attr('href') else tds.text

answered Mar 23 '18 at 04:37

Andrew Mackie

344
3
13

Or just use `tds.text` as the default, ie: `abc = base_url + tds.a.get('href', tds.text)` – t.m.adam Mar 23 '18 at 04:57
Yes, that's cleaner. – Andrew Mackie Mar 23 '18 at 05:09
True, but it still won't solve the problem - "in some cases in the loop there is no `` tag". Perhaps check if there is an 'a' tag? – t.m.adam Mar 23 '18 at 05:20
@t.m.adam your solution looks much cleaner. thank you – Sriram Arvind Lakshmanakumar Mar 23 '18 at 08:24

BeautifulSoup a.['href'] exception handling

2 Answers2