How to find element with specific parent?

Question

I have some HTML like:

<div class='cl1'>
    <div class='cl2'>text_1</div>
    <div class='cl3'>
        <div class='cl2'>text_2</div>
    </div>
</div>

I need to find any items of cl2 class that have cl1 as parent, so i need to get text_1 but not text_2. In simple css it should be like this:

'div.cl1>div.cl2'

but I use robobrowser and BeautifulSoup, and when I try

soup.select('div.cl1>div.cl2')

it says that css selector is wrong.

Can you try it with spaces, something like this maybe? 'div.cl1 > .cl2' — Tristan, Sep 14 '16 at 21:42
@Jan, it is not a child, it is a descendant. http://stackoverflow.com/questions/1182189/css-child-vs-descendant-selectors — Padraic Cunningham, Sep 14 '16 at 23:44

score 2 · Accepted Answer · answered Sep 14 '16 at 23:42

You selector is on the right track, you just need to space out the classes i.e div.cl1>div.cl2 should be div.cl1 > div.cl2:

In [5]: from bs4 import BeautifulSoup

In [6]: html = """<div class='cl1'>
    <div class='cl2'>text_1</div>
    <div class='cl3'>
        <div class='cl2'>text_2</div>
    </div>
</div>"""

In [7]: soup = BeautifulSoup(html, "html.parser")

In [8]: soup.select_one("div.cl1 > div.cl2") # good 
Out[8]: <div class="cl2">text_1</div>
In [9]: print(soup.select_one("div.cl1>div.cl2")) # bad
None

score 0 · Answer 2 · answered Sep 14 '16 at 21:53

One possible solution would be:

from bs4 import BeautifulSoup
data = """
<div class='cl1'>
    <div class='cl2'>text_1</div>
    <div class='cl3'>
        <div class='cl2'>text_2</div>
    </div>
</div>
"""
soup = BeautifulSoup(data)
divs = [div
        for div in soup.find_all("div", {'class': 'cl2'})
        if 'cl1' in div.parent["class"]]

# [<div class="cl2">text_1</div>]

How to find element with specific parent?

2 Answers2