How can I find a string inside a bs4.ResultSet (list) using Python?

Question

I'm trying to automate searching for ads in Facebook Ads Library. For that, I've used Selenium and BeautifulSoup to get the page's code.

The BeautifulSoup function returns a bs4.ResultSet with the page's HTML, which as I understand is a list.

I'm trying to loop through that list with soup.find_all, and for each element that is found, I want to test and see if there's a specific string in that.

But actually, my code isn't working as expected. The if statement inside the for loop always returns False.

# Using chrome driver
driver = webdriver.Chrome(ChromeDriverManager().install(), options=options)

# Web page url request
driver.get('https://www.facebook.com/ads/library/?active_status=all&ad_type=all&country=BR&q=frete%20gr%C3%A1tis%20aproveite&sort_data[direction]=desc&sort_data[mode]=relevancy_monthly_grouped&search_type=keyword_unordered&media_type=all')
driver.maximize_window()
time.sleep(10)

# Webscraping with BeautifulSoup
soup = BeautifulSoup(driver.page_source, 'html.parser') 

ads_list = []
for tag in soup.find_all('div', class_='_99s5'):
    if 'qku1pbnj j8otv06s r05nras9 a1itoznt te7ihjl9 svz86pwt a53abz89' in str(tag):
        ads_list.append(tag)
    else:
        None

Cause class names are dynamic, are you sure one of these is in your `soup`? Would be easier if you could point out the element your expecting to find. Maybe it could be located with another strategy. — HedgeHog, Mar 09 '22 at 21:10
First, I'm using *soup.find_all* to find all divs with class '_99s5'. Then, I'm checking each found div to see if there's a span with the class 'qku1pbnj j8otv06s r05nras9 a1itoznt te7ihjl9 svz86pwt a53abz89' inside the first div ('_99s5'). If the span exists, I'll use the div, if not, I'll ignore it. — Airã Carvalho da Silva, Mar 10 '22 at 11:37
Still understood that part also from your question. But these classes do not exist for some reasons, so could you provide anything (tag, id, text) what is in this `span` or around it that enables us to identify it. Would be great. — HedgeHog, Mar 10 '22 at 12:09
The `span` that I'm looking for always has the text `n ads use this creative and text`, beeing `n` a variable number. — Airã Carvalho da Silva, Mar 10 '22 at 13:00

score 1 · Answer 1 · answered Mar 09 '22 at 21:10

1

The following statement:

if 'qku1pbnj j8otv06s r05nras9 a1itoznt te7ihjl9 svz86pwt a53abz89' in str(tag)

will return True if and only if 'qku1pbnj j8otv06s r05nras9 a1itoznt te7ihjl9 svz86pwt a53abz89' is a substring of str(tag). I assume that you rather want to check whether str(tag) contains any of strings 'qku1pbnj j8otv06s r05nras9 a1itoznt te7ihjl9 svz86pwt a53abz89'. So it will be:

if any(e in str(tag) for e in 'qku1pbnj j8otv06s r05nras9 a1itoznt te7ihjl9 svz86pwt a53abz89'.split()):

answered Mar 09 '22 at 21:10

Opal

81,889
28
189
210

That line is always returning True, even though it should return False in some cases. The string 'qku1pbnj j8otv06s r05nras9 a1itoznt te7ihjl9 svz86pwt' doesn't always exist. – Airã Carvalho da Silva Mar 10 '22 at 11:43
@AirãCarvalhodaSilva, no it does not: `any(e in str('') for e in 'qku1pbnj j8otv06s r05nras9 a1itoznt te7ihjl9 svz86pwt a53abz89'.split())` returns `False`. How is the `tag` constructed? Can you display it? – Opal Mar 10 '22 at 12:44
`for tag in soup.find_all('div', class_='_99s5'): if any(e in str(tag) for e in 'qku1pbnj j8otv06s r05nras9 a1itoznt te7ihjl9 svz86pwt a53abz89'.split()): print('OK') else: print ('Not OK')` – Airã Carvalho da Silva Mar 10 '22 at 12:55
Sorry, pressed Enter and sent the code with no further clarification. I'm constructing the tag from elements that come from a *soup.find_all*. The soup.find_all is exactly the bs4.ResultSet or list that I'm checking if the string exists. – Airã Carvalho da Silva Mar 10 '22 at 12:57
@AirãCarvalhodaSilva I would start with checking what is returned by `str(tag)`. I suppose you're misusing it. – Opal Mar 10 '22 at 20:43

HedgeHog · Accepted Answer · 2022-03-10T14:00:18.327

As mentioned before, the strategy of using classes is not the best, as they can be very dynamic, so it would be better to stick to id, tag or perhaps text - but sometimes there may be no alternatives.

To select only the cards with a <span> containing the information that it has been used in ads, you can work with css selectors.

Following line will search for your outer <div> with class _99s5, that has a <span> containing your text and creates a ResultSet with these outer <div>:

ads_list = soup.select('div._99s5:has(:-soup-contains("ads use this creative and text"))')

Example

Note: Language of your browser/driver should be englisch, else you have to change the text you expect to find.

driver.get('https://www.facebook.com/ads/library/?active_status=all&ad_type=all&country=BR&q=frete%20gr%C3%A1tis%20aproveite&sort_data[direction]=desc&sort_data[mode]=relevancy_monthly_grouped&search_type=keyword_unordered&media_type=all')
driver.maximize_window()
time.sleep(10)

# Webscraping with BeautifulSoup
soup = BeautifulSoup(driver.page_source, 'html.parser') 

ads_list = soup.select('div._99s5:has(:-soup-contains("ads use this creative and text"))')

Alternativ, not that happy about, but to give you an orientation would be to select the <div> with a direct child <span> containing your text and move up the structure with .parent:

ads_list = []

for tag in soup.select('div > span:-soup-contains("ads use this creative and text")'):
    ads_list.append(tag.parent.parent.parent.parent.parent.parent)

That is what I needed. Can you take a look at this other question I've asked? https://stackoverflow.com/questions/71429213/delete-dynamic-element-with-selenium-python-and-beautifulsoup — Airã Carvalho da Silva, Mar 10 '22 at 18:42

How can I find a string inside a bs4.ResultSet (list) using Python?

2 Answers2

Example