How can I get texts with certain criteria in python with selenium? (texts with certain siblings)

Question

It's really tricky one for me so I'll describe the question as detail as possible.

First, let me show you some example of html.

....
....

<div class="lawcon">
    <p>
        <span class="b1">
            <label> No.1 </label>
        </span>
    </p>

    <p>
    "I Want to get 'No.1' label in span if the div[@class='lawcon'] has a certain <a> tags with "bb" title, and with a string of 'Law' in the text of it."
        <a title="bb" class="link" onclick="javascript:blabla('12345')" href="javascript:;">Law Power</a>
    </p>
</div>

<div class="lawcon">
    <p>
        <span class="b1">
            <label> No.2 </label>
    </p>

    <p>
    "But I don't want to get No.2 label because, although it has <a> tag with "bb" title, but it doesn't have a text of law in it"
        <a title="bb" class="link" onclick="javascript:blabla('12345')" href="javascript:;">Just Power</a>

    </p>

</div>

<div class="lawcon">
    <p>
        <span class="b1">
            <label> No.3 </label>
    </p>

    <p>
    "If there are multiple <a> tags with the right criteria in a single div, I want to get span(No.3) for each of those" <a>
        <a title="bb" class="link" onclick="javascript:blabla('12345')" href="javascript:;">Lawyer</a>
        <a title="bb" class="link" onclick="javascript:blabla('12345')" href="javascript:;">By the Law</a>
        <a title="bb" class="link" onclick="javascript:blabla('12345')" href="javascript:;">But not this one</a>

...
...
...

So, here is the thing. I want to extract the text of (e.g. No.1) in div[@class='lawcon'] only if the div has a tag with "bb" title, with a string of 'Law' in it.

If inside of the div, if there isn't any tag with "bb" title, or string of "Law" in it, the span should not be collected.

What I tried was

div_list = [div.text for div in driver.find_elements_by_xpath('//span[following-sibling::a[@title="bb"]]')]

But the problem is, when it has multiple tag with right criteria in a single div, it only return just one div.

What I want to have is a location(: span numbers) list(or tuple) of those text of tags

So it should be like

[[No.1 - Law Power], [No.3 - Lawyer], [No.3 - By the Law]]

I'm not sure I have explained enough. Thank you for your interests and hopefully, enlighten me with your knowledge! I really appreciate it in advance.

supputuri · Accepted Answer · 2019-08-26T04:26:29.343

Here is the simple python script to get your desired output.

links = driver.find_elements_by_xpath("//a[@title='bb' and contains(.,'Law')]")
linkData = []
for link in links:
    currentList = []
    currentList.append(link.find_element_by_xpath("./ancestor::div[@class='lawcon']//label").text + '-' + link.text)
    linkData.append(currentList)
print(linkData)

Output:

[['No.1-Law Power'], ['No.3-Lawyer'], ['No.3-By the Law']]

I am not sure why you want the output in that format. I would prefer the below approach, so that you will get to know how many divs have the matching links and then you can access the links from the output based on the divs. Just a thought.

divs = driver.find_elements_by_xpath("//a[@title='bb' and contains(.,'Law')]//ancestor::div[@class='lawcon']")
linkData = []
for div in divs:
    currentList = []
    for link in div.find_elements_by_xpath(".//a[@title='bb' and contains(.,'Law')]"):
        currentList.append(div.find_element_by_xpath(".//label").text + '-' + link.text)
    linkData.append(currentList)
print(linkData)

Output:

[['No.1-Law Power'], ['No.3-Lawyer', 'No.3-By the Law']]

contains ...! and ancestor. That's how I should have approached! Thank you supputurl! Really appreciates it! I'll try it as soon as I got back home and make it solved :D — Jeong In Kim, Aug 26 '19 at 04:27

score 0 · Answer 2 · answered Aug 26 '19 at 07:35

As your requirement is to extract the texts No.1 and so on, which are within a <label> tag, you have to induce WebDriverWait for the visibility_of_all_elements_located() and you will have only 2 matches (against your expectation of 3) and you can use the following Locator Strategy:

Using XPATH:

print([my_elem.get_attribute("innerHTML") for my_elem in WebDriverWait(driver, 5).until(EC.visibility_of_all_elements_located((By.XPATH, "//div[@class='lawcon']//a[@title='bb' and contains(.,'Law')]//preceding::label[1]")))])

How can I get texts with certain criteria in python with selenium? (texts with certain siblings)

2 Answers2