31

In BeautifulSoup, if I want to find all div's where whose class is span3, I'd just do:

result = soup.findAll("div",{"class":"span3"})

However, in my case, I want to find all div's whose class starts with span3, therefore, BeautifulSoup should find:

<div id="span3 span49">
<div id="span3 span39">

And so on...

How do I achieve what I want? I am familiar with regular expressions; however I do not know how to implement them to beautiful soup nor did I find any help by going through BeautifulSoup's documentation.

Thomas Tempelmann
  • 11,045
  • 8
  • 74
  • 149
George Chalhoub
  • 14,968
  • 3
  • 38
  • 61
  • I haven't used BeautifulSoup but it seems to me the documentation is actually [pretty clear on this point](http://www.crummy.com/software/BeautifulSoup/bs4/doc/#searching-by-css-class): "As with any keyword argument, you can pass class_ a string, **a regular expression**, a function, or `True`." – Two-Bit Alchemist Feb 17 '16 at 18:57

2 Answers2

47

Well, these are id attributes you are showing:

<div id="span3 span49">
<div id="span3 span39">

In this case, you can use:

soup.find_all("div", id=lambda value: value and value.startswith("span3"))

Or:

soup.find_all("div", id=re.compile("^span3"))

If this was just a typo, and you actually have class attributes start with span3, and your really need to check the class to start with span3, you can use the "starts-with" CSS selector:

soup.select("div[class^=span3]")

This is because you cannot check the class attribute the same way you checked the id attribute because class is special, it is a multi-valued attribute.

alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195
  • @GeorgeStack and make sure you are using `beautifulsoup4`: your import should be `from bs4 import BeautifulSoup`. – alecxe Feb 17 '16 at 19:39
  • Do you mean doing this for the class: **for a in soup.select("div[class^=span3]"):** #do the shit – George Chalhoub Feb 17 '16 at 20:05
  • @GeorgeStack yeah, exactly. – alecxe Feb 17 '16 at 20:27
  • @aIecxe Did you learn this from the docs: `id=lambda value: value and value.startswith("span3")`? I just learned that you can use this type of callable with `text` (as well it seems), but didn't find anything in the docs regarding this usage (only checked `text` though) – Moondra Jun 17 '19 at 03:42
15

This works too:

soup.select("div[class*=span3]") # with *= means: contains
oscarAguayo
  • 181
  • 1
  • 7