1

I have looked at similar posts, which come close to my case, but my result nonetheless seems unexpected.

import BeautifulSoup
import re

soup = BeautifulSoup.BeautifulSoup(<html page of interest>)
if (soup.find_all("td", attrs= {"class": "FilterElement"}, text= re.compile("HERE IS TEXT I AM LOOKING FOR")) is None):
    print('There was no entry')
else:
    print(soup.find("td", attrs= {"class": "FilterElement"}, text= re.compile("HERE IS THE TEXT I AM LOOKING FOR")))

I obviously filtered out the actual HTML page, as well as the text in the regular expression. The rest is exactly as written. I get the following error:

Traceback (most recent call last):
  File "/Users/appa/src/workspace/web_forms/WebForms/src/root/queryForms.py", line 51, in <module>
    LoopThroughDays(form, id, trailer)
  File "/Users/appa/src/workspace/web_forms/WebForms/src/root/queryForms.py", line 33, in LoopThroughDays
    if (soup.find_all("td", attrs= {"class": "FilterElement"}, text= re.compile("HERE IS THE TEXT I AM LOOKING FOR")) is None):
TypeError: 'NoneType' object is not callable

I understand that the text will sometimes be missing. But I thought that the way I have set up the if statement was precisely able to capture when it is missing, and therefore a NoneType.

Thanks in advance for any help!

Mike Williamson
  • 4,915
  • 14
  • 67
  • 104
  • don't your import should be `from bs4 import BeautifulSoup`? as `BeautifulSoup4` is in `bs4` module – salmanwahed Aug 05 '14 at 09:35
  • @salmanwahed Yes, I saw other sites referring to bs4, but when I installed (via easy_install), the base library was simply BeautifulSoup. I use it without a problem in other areas, so I suspect this is another (newer?) version of BeautifulSoup? Not sure why the difference... – Mike Williamson Aug 05 '14 at 15:57

1 Answers1

2

It looks like it's just a typo. It should be soup.findAll not soup.find_all. I tried running it, and it works with the correction. So the full program should be:

import BeautifulSoup
import re

soup = BeautifulSoup.BeautifulSoup(<html page of interest>)
if (soup.findAll("td", attrs= {"class": "FilterElement"}, text= re.compile("HERE IS TEXT I AM LOOKING FOR")) is None):
    print('There was no entry')
else:
    print(soup.find("td", attrs= {"class": "FilterElement"}, text= re.compile("HERE IS THE TEXT I AM LOOKING FOR")))<html page of interest>
jstein123
  • 454
  • 2
  • 11
  • 1
    Thanks so much! I will not be able to check until later in the evening, but I will give an "accepted" check mark at that point. – Mike Williamson Aug 05 '14 at 15:58
  • `find_all` is correct in version 4+. findAll will still work, but find_all is more pythonic: see http://stackoverflow.com/questions/12339323/beautifulsoup-findall-find-all – fantabolous Oct 04 '16 at 02:50