0

I am using bs4 and scraping links, in some cases in the loop there is no <a href=""> tag.

So in that case I want to input the text attribute of the element.

my sample code,

base_url = "http://example.com"

abc = base_url + str(tds.a['href']) if tds.a['href'] else tds.text

Exception is thrown

TypeError: 'NoneType' object is not subscriptable

This is how my td element looks like:

<td nowrap=""><font face="Arial" size="1"><a href="view_document?docurl=http://www.envirostor.dtsc.ca.gov/public/deliverable_documents/6382679581/Recorded%20LUC%2010%2D14%2D2010%2Epdf" target="6382679581">[VIEW COVENANT]</a> </font></td>"

How to solve this??

Ps. using Python 3 and Bs4

the.salman.a
  • 945
  • 8
  • 29
  • Presumably here either `tds` or `tds.a` is `None`. You'll want to check that instead `tds.a['href'] if tds.a else 'some default'` – Bailey Parker Mar 23 '18 at 04:34

2 Answers2

1

In Python, it is always EAFP (Easier to Ask for Forgiveness than Permission).

If the a tag doesn't have the href attribute, tds.a['href'] will raise a KeyError.
If the td tag doesn't have the a tag, tds.a['href'] will raise a TypeError as shown in the question.

So, using the EAFP principle:

base_url = "http://example.com"
try:
    abc = base_url + tds.a['href']
except (KeyError, TypeError):
    abc = base_url + tds.text
Keyur Potdar
  • 7,158
  • 6
  • 25
  • 40
0

Use the has_attr() method in BS4:

abc = base_url + str(tds.a['href']) if tds.has_attr('href') else tds.text
Andrew Mackie
  • 344
  • 3
  • 13