0

I'm trying to automate searching for ads in Facebook Ads Library. For that, I've used Selenium and BeautifulSoup to get the page's code.

The BeautifulSoup function returns a bs4.ResultSet with the page's HTML, which as I understand is a list.

I'm trying to loop through that list with soup.find_all, and for each element that is found, I want to test and see if there's a specific string in that.

But actually, my code isn't working as expected. The if statement inside the for loop always returns False.

# Using chrome driver
driver = webdriver.Chrome(ChromeDriverManager().install(), options=options)

# Web page url request
driver.get('https://www.facebook.com/ads/library/?active_status=all&ad_type=all&country=BR&q=frete%20gr%C3%A1tis%20aproveite&sort_data[direction]=desc&sort_data[mode]=relevancy_monthly_grouped&search_type=keyword_unordered&media_type=all')
driver.maximize_window()
time.sleep(10)

# Webscraping with BeautifulSoup
soup = BeautifulSoup(driver.page_source, 'html.parser') 

ads_list = []
for tag in soup.find_all('div', class_='_99s5'):
    if 'qku1pbnj j8otv06s r05nras9 a1itoznt te7ihjl9 svz86pwt a53abz89' in str(tag):
        ads_list.append(tag)
    else:
        None
Opal
  • 81,889
  • 28
  • 189
  • 210
  • Cause class names are dynamic, are you sure one of these is in your `soup`? Would be easier if you could point out the element your expecting to find. Maybe it could be located with another strategy. – HedgeHog Mar 09 '22 at 21:10
  • First, I'm using *soup.find_all* to find all divs with class '_99s5'. Then, I'm checking each found div to see if there's a span with the class 'qku1pbnj j8otv06s r05nras9 a1itoznt te7ihjl9 svz86pwt a53abz89' inside the first div ('_99s5'). If the span exists, I'll use the div, if not, I'll ignore it. – Airã Carvalho da Silva Mar 10 '22 at 11:37
  • Still understood that part also from your question. But these classes do not exist for some reasons, so could you provide anything (tag, id, text) what is in this `span` or around it that enables us to identify it. Would be great. – HedgeHog Mar 10 '22 at 12:09
  • 1
    The `span` that I'm looking for always has the text `n ads use this creative and text`, beeing `n` a variable number. – Airã Carvalho da Silva Mar 10 '22 at 13:00

2 Answers2

1

The following statement:

if 'qku1pbnj j8otv06s r05nras9 a1itoznt te7ihjl9 svz86pwt a53abz89' in str(tag)

will return True if and only if 'qku1pbnj j8otv06s r05nras9 a1itoznt te7ihjl9 svz86pwt a53abz89' is a substring of str(tag). I assume that you rather want to check whether str(tag) contains any of strings 'qku1pbnj j8otv06s r05nras9 a1itoznt te7ihjl9 svz86pwt a53abz89'. So it will be:

if any(e in str(tag) for e in 'qku1pbnj j8otv06s r05nras9 a1itoznt te7ihjl9 svz86pwt a53abz89'.split()):
Opal
  • 81,889
  • 28
  • 189
  • 210
  • That line is always returning True, even though it should return False in some cases. The string 'qku1pbnj j8otv06s r05nras9 a1itoznt te7ihjl9 svz86pwt' doesn't always exist. – Airã Carvalho da Silva Mar 10 '22 at 11:43
  • @AirãCarvalhodaSilva, no it does not: `any(e in str('') for e in 'qku1pbnj j8otv06s r05nras9 a1itoznt te7ihjl9 svz86pwt a53abz89'.split())` returns `False`. How is the `tag` constructed? Can you display it? – Opal Mar 10 '22 at 12:44
  • `for tag in soup.find_all('div', class_='_99s5'): if any(e in str(tag) for e in 'qku1pbnj j8otv06s r05nras9 a1itoznt te7ihjl9 svz86pwt a53abz89'.split()): print('OK') else: print ('Not OK')` – Airã Carvalho da Silva Mar 10 '22 at 12:55
  • Sorry, pressed Enter and sent the code with no further clarification. I'm constructing the tag from elements that come from a *soup.find_all*. The soup.find_all is exactly the bs4.ResultSet or list that I'm checking if the string exists. – Airã Carvalho da Silva Mar 10 '22 at 12:57
  • @AirãCarvalhodaSilva I would start with checking what is returned by `str(tag)`. I suppose you're misusing it. – Opal Mar 10 '22 at 20:43
0

As mentioned before, the strategy of using classes is not the best, as they can be very dynamic, so it would be better to stick to id, tag or perhaps text - but sometimes there may be no alternatives.

To select only the cards with a <span> containing the information that it has been used in ads, you can work with css selectors.

Following line will search for your outer <div> with class _99s5, that has a <span> containing your text and creates a ResultSet with these outer <div>:

ads_list = soup.select('div._99s5:has(:-soup-contains("ads use this creative and text"))')

Example

Note: Language of your browser/driver should be englisch, else you have to change the text you expect to find.

driver.get('https://www.facebook.com/ads/library/?active_status=all&ad_type=all&country=BR&q=frete%20gr%C3%A1tis%20aproveite&sort_data[direction]=desc&sort_data[mode]=relevancy_monthly_grouped&search_type=keyword_unordered&media_type=all')
driver.maximize_window()
time.sleep(10)

# Webscraping with BeautifulSoup
soup = BeautifulSoup(driver.page_source, 'html.parser') 

ads_list = soup.select('div._99s5:has(:-soup-contains("ads use this creative and text"))')

Alternativ, not that happy about, but to give you an orientation would be to select the <div> with a direct child <span> containing your text and move up the structure with .parent:

ads_list = []

for tag in soup.select('div > span:-soup-contains("ads use this creative and text")'):
    ads_list.append(tag.parent.parent.parent.parent.parent.parent)
HedgeHog
  • 22,146
  • 4
  • 14
  • 36
  • That is what I needed. Can you take a look at this other question I've asked? https://stackoverflow.com/questions/71429213/delete-dynamic-element-with-selenium-python-and-beautifulsoup – Airã Carvalho da Silva Mar 10 '22 at 18:42