0

I've used BeautifulSoup to find a specific div class in the page's HTML. I want to check if this div has a span class inside it. If the div has the span class, I want to maintain it on the page's code, but if it doesn't, I want to delete it, maybe using Selenium.

For that I have two lists selecting the elements (div and span). I tried to check if one list is inside the other, and that kind of worked. But how can one delete that found element from the page's source code?

Edit

I've edited the code after a few conversations in the commentaries section. With help, I was able to implement code to remove elements executing javascript.

The code is running with no errors, but nothing is being deleted from the page.

# Import required module
from selenium import webdriver 
from selenium.webdriver.chrome.options import Options
from webdriver_manager.chrome import ChromeDriverManager
import time

# Option to launch browser in incognito
options = Options()
options.add_argument("--incognito")
#options.add_argument("--headless")

# Using chrome driver
driver = webdriver.Chrome(ChromeDriverManager().install(), options=options)

# Web page url request
driver.get('https://www.facebook.com/ads/library/?active_status=all&ad_type=all&country=BR&q=frete%20gr%C3%A1tis%20aproveite&sort_data[direction]=desc&sort_data[mode]=relevancy_monthly_grouped&search_type=keyword_unordered&media_type=all')
driver.maximize_window()
time.sleep(10)

driver.execute_script("""
  for(let div of document.querySelectorAll('div._99s5')){
    let match = div.innerText.match(/(\d+) ads? use this creative and text/)
    let numAds = match ? parseInt(match[1]) : 0
    if(numAds < 10){
      div.querySelector(".tp-logo")?.remove()
    }
  }
""")
  • What do you mean by delete the element, edit the source code of the html? – Captain Caveman Mar 10 '22 at 18:46
  • Are you wanting to edit the html locally and save it - knowing that you can't edit source code on a server from the client? – Captain Caveman Mar 10 '22 at 18:59
  • I could go with the solution that saves the html locally, on a file, for example. But can't I use Selenium and JavaScript to edit the HTML directly on the browser? Obviously that that change would occur only for who's running the program and only for visualization porpuses. – Airã Carvalho da Silva Mar 10 '22 at 19:04
  • What is your end goal? – Captain Caveman Mar 10 '22 at 19:13
  • I want to filter the ads and keep on the page only those with the text "n ads use this creative and text" and "n" being greater than "x". I'll define "x". – Airã Carvalho da Silva Mar 10 '22 at 19:19
  • This will work: https://stackoverflow.com/questions/33199740/webdriver-remove-element-from-page – Captain Caveman Mar 10 '22 at 19:39
  • Okay, I think I can implement the Selenium part. But I'm not sure how to delete the right nodes. Could you present an alternative for how to find the right node? I can take care of the Selenium part. – Airã Carvalho da Silva Mar 10 '22 at 19:39
  • right click on the element on the page and choose inspect. – Captain Caveman Mar 10 '22 at 19:44
  • Okay, sorry, I wasn't very clear. Could you take a look at line 12 of the code? In this line I have a for loop going through all the elements in the list that I created to store all divs with class name "div._99s5". Then, comparing with the second list, which contains only the ads with the text "ads use this creative and text", which are child nodes from "div._99s5", I've creaded an if statement to check if the "div._99s5" contains the text or not. The code returns True or False correctly, but how can one tell Selenium: "okay, the node is x"? – Airã Carvalho da Silva Mar 10 '22 at 19:50
  • You have the node name in your if statement where you are checking for True or False. Pass that value to whatever action you intend to take. – Captain Caveman Mar 10 '22 at 19:56
  • I thought the if statement had all the node elements, not the node name. Isn't that right? Maybe I should find a way to discover the node name. – Airã Carvalho da Silva Mar 10 '22 at 19:58

2 Answers2

2

Since you're deleting them in javascript anyway:

driver.execute_script("""
  for(let div of document.querySelectorAll('div._99s5')){
    let match = div.innerText.match(/(\d+) ads? use this creative and text/)
    let numAds = match ? parseInt(match[1]) : 0
    if(numAds < 10){
      div.querySelector(".tp-logo")?.remove()
    }
  }
""")
pguardiario
  • 53,827
  • 19
  • 119
  • 159
  • I've edited my post, but that was before I saw your answer. I've posted two options I was trying to implement using javascript. I think yours is better. But, the elements weren't deleted from the page. The code ran without errors, but nothing happened on the browser. – Airã Carvalho da Silva Mar 11 '22 at 01:51
  • Also, I haven't mentioned on my question, so pardon me, but the string I'm looking for, which is "ads use this creative and text", isn't the only thing I'll check before deleting. This string is preceded by a number, like: "15 ads use this creative and text". I have to check if that number is greater than 10, for example. I have to take the whole string and get only the number. I know the class of that element which is a span with a specific class number. – Airã Carvalho da Silva Mar 11 '22 at 01:56
  • I've wrote this in pseudo-code, could you help me with the javascript part? `driver.execute_script(""" for(let div of document.querySelectorAll('div._99s5')){ if(!div.innerText.match("ads use this creative and text")){ div.querySelector(".tp-logo")?.remove() } if((element with span class).replace(/\D/g, "") < 10)){ div.querySelector(".tp-logo")?.remove() } } """)` – Airã Carvalho da Silva Mar 11 '22 at 02:11
  • Check my update – pguardiario Mar 11 '22 at 02:56
  • The code is running with no errors, but no element is being deleted from the page. I've added the full code I'm using on the original question. Can you check it to see if I'm missing something? – Airã Carvalho da Silva Mar 11 '22 at 10:34
  • I've tried to run the javascript code on the browser's console, but I get an "undefined" message. – Airã Carvalho da Silva Mar 11 '22 at 11:52
  • I'd have to see the website to test it. – pguardiario Mar 12 '22 at 00:14
  • This is the website: https://www.facebook.com/ads/library/?active_status=all&ad_type=all&country=BR&q=frete%20gr%C3%A1tis%20aproveite&sort_data[direction]=desc&sort_data[mode]=relevancy_monthly_grouped&search_type=keyword_unordered&media_type=all – Airã Carvalho da Silva Mar 12 '22 at 00:37
  • It doesn't load for me, sorry. – pguardiario Mar 12 '22 at 02:23
  • Make sure your AdBlock is not blocking the ads on the page. https://www.facebook.com/ads/library/?active_status=all&ad_type=all&country=BR&q=free%20shipping&sort_data[direction]=desc&sort_data[mode]=relevancy_monthly_grouped&search_type=keyword_unordered&media_type=all – Airã Carvalho da Silva Mar 12 '22 at 12:03
0

Note: Question and comments reads a bit confusing so it would be great to improve it a bit. Assuming you like to decompose() some elements, the reason why or what to do after this action is not clear. So this answer will only point out an apporache.

To decompose() the elements that do not contains ads use this creative and text just negate your selection and iterate the ResultSet:

for e in soup.select('div._99s5:has(:not(:-soup-contains("ads use this creative and text")))'):
    e.decompose()

Now these elements will no longer be included in your soup and you could process it for your needs.

HedgeHog
  • 22,146
  • 4
  • 14
  • 36
  • About the reason why or what to do after this action, my end goal is to maintain on the page only the ads that contain "ads use this creative and text". Not all `divs` with class `_99s5` contains this string. This string is also preceded by a number, and I'll check if that number is greater than, let's say, 10, and in that case keep the ad on the page. – Airã Carvalho da Silva Mar 10 '22 at 21:24
  • About the scrolling, that's is done already. – Airã Carvalho da Silva Mar 10 '22 at 21:25
  • Okay so `decompose()` should work for you in both cases to "delete" these elements in your `soup` !? – HedgeHog Mar 10 '22 at 21:32
  • With `decompose()` I could save the HTML locally. I'll probably try to delete the documents in the browser, even if it's temporary. I think @pguardiario response is more what I was looking for. – Airã Carvalho da Silva Mar 11 '22 at 01:49
  • Under aspect to do all the processing in the "browser" I would agree. – HedgeHog Mar 11 '22 at 07:06