3

I'm trying to figure out how css pseudo-classes like not:() and has:() work in the following cases.

The following selector is not supposed to print 27A-TAX DISTRICT 27A but it does print it:

from bs4 import BeautifulSoup

htmlelement = """
<tbody>
  <tr style="">
     <td><a>27A-TAX DISTRICT</a> 27A</td>
  </tr>

  <tr style="">
     <td><strong>Parcel Number</strong> 720</td>
  </tr>
</tbody>
"""
soup = BeautifulSoup(htmlelement,"lxml")
item = soup.select_one("tr:not(a)").text
print(item)

On the other hand, the following selector is supposed to print I should be printed but it throws AttributeError error.

from bs4 import BeautifulSoup

htmlelement = """
<p class="vital">I should be printed</p>
<p>I should not be printed</p>
"""
soup = BeautifulSoup(htmlelement,"lxml")
item = soup.select_one("p:has(.vital)").text
print(item)

Where I'm going wrong and how can I make them work?

MITHU
  • 113
  • 3
  • 12
  • 41
  • Possible duplicate of [Test if an attribute is present in a tag in BeautifulSoup](https://stackoverflow.com/questions/5015483/test-if-an-attribute-is-present-in-a-tag-in-beautifulsoup) – Zaraki Kenpachi Jun 27 '19 at 13:08
  • I don't think that it's a duplicate – RomanPerekhrest Jun 27 '19 at 13:15
  • If you think this post to be a duplicate one then you should know the reason why my script is behaving otherwise @Zaraki Kenpachi. What is it then? – MITHU Jun 27 '19 at 13:16
  • This is not a duplicate. The user is simply asking about there misunderstanding of how `:has()` and `:not()` operate. Confusion is due to a misunderstanding of how `:has()` and `:not()` actually operate which I have addressed in an answer below. – facelessuser Jun 27 '19 at 13:23

1 Answers1

4

Unfortunately, your understanding of what :not() and :has() does is most likely not correct.

In your first example, you use:

soup.select_one("tr:not(a)").text

The way you are using it will select every tr. This is because it is saying "I want a tr tag that is not an a tag. tr tags are never a tags so your code always grabs the text of any tr tag, including the one that contains 27A-TAX DISTRICT.

If you want tr tags that don't have a tags, then you could use:

soup.select_one("tr:not(:has(a))").text

What this says is "I want a tr tag that does not have a descendant a tag".

For more info read:


This leads us to your second issue. :has() is a relational selector. In your second example, you used:

soup.select_one("p:has(.vital)").text

:has() looks ahead at either children, descendants, or sibling (depending on the syntax you use) to determine if the tag is the the one you want.

So what you were saying was "I want a p tag that has a descendant tag with the class vital". None of your p tags even have descendants, so there is no way one could have a vital class. What you want is actually more simple:

soup.select_one("p.vital").text

What this says is "I want a p tag that also has a class vital."

For more info read:

facelessuser
  • 1,656
  • 1
  • 13
  • 11
  • Yeah, It seems I got them wrong. So `:has()` works when there is any children or sibling. Btw, `:contains()` only looks for text or string but not tag,id, class e.t.c, right? Thanks for the clarity of things @facelessuser. – MITHU Jun 27 '19 at 13:30
  • 2
    `:has()` behavior changes based on the leading combinator: `:has(+ tag)` the very next sibling, `:has(> tag)` is any descendant etc. `:contains()` searches the text under a given tag (included the text in any child tags). It does not look at attributes. For that you'd need to use attribute selectors: https://facelessuser.github.io/soupsieve/selectors/#attribute-selectors. – facelessuser Jun 27 '19 at 13:33