2

I'm parsing some tables w/ BeautifulSoup, and came across an easy way to pick out the table's td and th tags. Try as I might, though, I don't know why this particular bit of code works (specifically: the very last line).

response = urlopen(url)
table = SoupStrainer('table',{'border': 0, 'cellpadding': 5})
soup = BeautifulSoup(html, parseOnlyThese = table)
soup.findAll(lamba tag: tag.name == "td")

What's the point of defining the anonymous function, dat? I've tried soup.findAll(name == "td") which doesn't work, but soup.findAll(lambda grop: grop.name == "td") works. How is this lambda function interacting with BeautifulSoup and why do I need it? Is there another way of writing the same code that makes things a bit more clear?

AmagicalFishy
  • 1,249
  • 1
  • 12
  • 36
  • `soup.findAll(name == "td")` is very different from `soup.findAll(name = "td")`. The latter might have worked for you. – Robᵩ Feb 06 '17 at 17:51

2 Answers2

3

The first argument to find*() functions in BeautifulSoup can be a function.

In this particular case:

soup.findAll(lambda tag: tag.name == "td")

is really an overkill and is equivalent to:

soup.findAll("td")

A function can also be used to filter a specific attribute, for example:

soup.find_all("a", href=lambda href: href and href.startswith("http"))

Some of the real-world use cases:

Community
  • 1
  • 1
alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195
0

In BS4, there are five filter in the find(), function is one of them:

You can define a function that takes an element as its only argument. The function should return True if the argument matches, and False otherwise.

It does not matter how you function defined, as long it takes an element.

宏杰李
  • 11,820
  • 2
  • 28
  • 35