6

I am using following code to match all div that have CSS class "ad_item".

soup.find_all('div',class_="ad_item")

problem that I have is that, on that web page, there are also div that have CSS class set to "ad_ex_item" and "ad_ex_item".

<div class="ad_item ad_ex_item">

In documentation it is stated:

When you search for a tag that matches a certain CSS class, you’re matching against any of its CSS classes:

So how can I match div, that have only "ad_item", and do not have "ad_ex_item".

Or to put this in another way, how to search for div that have only CSS class "ad_item" ?

WebOrCode
  • 6,852
  • 9
  • 43
  • 70
  • possible duplicate of http://stackoverflow.com/questions/1242755/beautiful-soup-cannot-find-a-css-class-if-the-object-has-other-classes-too?rq=1 – TerryA Jan 24 '13 at 08:33

7 Answers7

10

You can use strict conditions like this:

soup.select("div[class='ad_item']")

That catch div with exact class. In this case with only 'ad_item' and no others joined by spaces classes.

ctrl
  • 161
  • 1
  • 6
9

I have found one solution, although it have nothing to do with BS4, it is pure python code.

for item in soup.find_all('div',class_="ad_item"):
     if len(item["class"]) != 1:
         continue;

It basically skip item, if there is more than one CSS class.

WebOrCode
  • 6,852
  • 9
  • 43
  • 70
2

You can pass a lambda functions to find and find_all methods.

soup.find_all(lambda x:
    x.name == 'div' and
    'ad_item' in x.get('class', []) and
    not 'ad_ex_item' in x['class']
)

The x.get('class', []) will avoid KeyError exceptions for div tags without class attribute.

If you need to exclude more than only one class you can substitute the last condition with:

    not any(c in x['class'] for c in {'ad_ex_item', 'another_class'})

And if you want to exclude exactly some classes you can use:

   not all(c in x['class'] for c in {'ad_ex_item', 'another_class'})
Nuno André
  • 4,739
  • 1
  • 33
  • 46
0

Did you try to use select : http://www.crummy.com/software/BeautifulSoup/bs4/doc/#css-selectors

soup.select(".add_item")

Unfortunately, it seems that the :not selector of CSS3 is not supported. If you really need this, you may have to look at lxml. It seems to support it. see http://packages.python.org/cssselect/#supported-selectors

luc
  • 41,928
  • 25
  • 127
  • 172
0

You can always write a Python function that matches the tag you want, and pass that function into find_all():

def match(tag):
    return (
        tag.name == 'div'
        and 'ad_item' in tag.get('class')
        and 'ad_ex_item' not in tag.get('class'))

soup.find_all(match)
Leonard Richardson
  • 3,994
  • 2
  • 17
  • 10
0

The top answer is correct but if you want a way to keep the for loop clean or like one line solutions then use the list comprehension below.

data = [item for item in soup.find_all("div", class_="ad_item") if len(item["class"]) == 1] 
Marcus Salinas
  • 356
  • 3
  • 8
-3
soup.fetch('div',{'class':'add_item'})