1

I want to find all of tds that don't have a custom html attribute data-stat="randomValue"
My data looks something like this:

<td data-stat="foo">10</td>
<td data-stat="bar">20</td>
<td data-stat="test">30</td>
<td data-stat="DUMMY"> </td>

I know that I can just select for foo, bar, and test but my actual dataset will have hunders of different values for data-set so it just wouldn't be feasible to code.

Is there something like a != operator that I can use in beautiful soup? I tried doing:

[td.getText() for td in rows[i].findAll('td:not([data-stat="DUMMY"])')]

but I only get [] as a value.

theloosygoose
  • 162
  • 1
  • 2
  • 6
  • Does this answer your question? [Exclude unwanted tag on Beautifulsoup Python](https://stackoverflow.com/questions/40760441/exclude-unwanted-tag-on-beautifulsoup-python) – Michael Ruth Sep 02 '22 at 22:46
  • `.findall()` doesn't accept CSS selector syntax. Use `.select()` – Barmar Sep 02 '22 at 22:48

1 Answers1

1

You can use list comprehension to filter out the unvanted tags, for example:

print([td.text for td in soup.find_all("td") if td.get("data-stat") != "DUMMY"])

Or use CSS selector with .select (as @Barmar said in comments, .find_all doesn't accept CSS selectors):

print([td.text for td in soup.select('td:not([data-stat="DUMMY"])')])
Andrej Kesely
  • 168,389
  • 15
  • 48
  • 91
  • thanks! This works but what is the difference between the find all and select solutions? – theloosygoose Sep 02 '22 at 22:53
  • @theloosygoose `.select` accepts CSS selectors, `.find_all` doesn't. The difference is only the API (for example, you can use `lambda` in `.find_all` etc.) – Andrej Kesely Sep 02 '22 at 22:54