1

The problem statement goes this way: Find the % physical occupancy of ads on a webpage.

Eg. Say I have a URL which when opened has its content and 3 ads - one is an image ad and the other 2 are 'image and text' ad. (I have been given many such URLs with an unknown number of ads). I count the number of ads based on the bin class that had 'ad' or 'sponsored' in it and so I know there are 3 ads on its page. Now, I need to find the occupancy of these ads as a percentage of the entire web page i.e., say all three ads together occupy 20% of the page. How do I do it?

I understand that elements don't render the same in different browsers and I actually do not care about that. I just need a rough percentage based on Chrome (or Firefox - anything is okay).

A similar question asked back in 2013 How to programmatically measure the elements' sizes in HTML source code using python? has only 2 solutions and not much information. I found the API for the suggested package Ghost (the one agreed to by the asker as helpful) pretty difficult to understand.

I was asked to 'render a website' using a headless browser without ads first and then with ads and find a difference. Problem is, I don't know how. I also am just hoping that in the last 8 years someone to have come up with a simpler solution to this problem.

Since I am new to using Python for "scraping" in this manner - if it can even be called "scraping" - I could use any resources/ideas/documentations that you might know of.

Nilima
  • 197
  • 1
  • 2
  • 9
  • There's a way to get image height and width, we can combine all 3 height and width. But I wonder for percentage calculation we need all elements width and Hight to compare against these 3 – cruisepandey Jul 20 '21 at 11:17
  • @cruisepandey That is the first thing I looked for too. Height and width on an ad are not present as a value in the html since it dynamically changes with screen size, browser type device type etc. – Nilima Jul 20 '21 at 11:21
  • I am not worried about dynamically changing values of ads, that we can capture anyway, we need to get total number of elements in a webpage and their Hight, width so that we will have something to compare when we will calculate the percentage – cruisepandey Jul 20 '21 at 11:26
  • @cruisepandey There is no height and width. At least, none that I could find. Ads or not, the problem remains the same - since the dimensions change for every element, I don't really have a parameter to anchor my code to. But I gotta ask because your reply has intrigued me - how do you say that you can capture dynamically changing size of an ad-block? – Nilima Jul 20 '21 at 11:39
  • well it's all in DOM, so if you could find ads locator in DOM, we can solve this problem. Can you check if they are in iframe. I have got a solution where we can actually calculate all elements width and height using `.size` method in Selenium. Probably I will post that as an answer, – cruisepandey Jul 20 '21 at 11:45

1 Answers1

0

We can calculate all the elements height and width using .size method.

xpath to locate all the elements :

//*

and then we can calculate ads, height and width since they are web element we can use the same .size method.

Demonstration below :

driver.maximize_window()
driver.implicitly_wait(30)
driver.get("https://stackoverflow.com/questions/68453828/is-there-a-simple-way-to-calculate-the-percentage-physical-space-occupied-by?noredirect=1#comment120979267_68453828")
wait = WebDriverWait(driver, 10)
width = []
height = []
for element in driver.find_elements(By.XPATH, "//*"):
    size = element.size
    w, h = size['width'], size['height']
    width.append(w)
    height.append(h)

total_width = sum(width)
total_height = sum(height)

print(total_width, total_height)

# Now calculate the width and heights of ads,

first_ad = wait.until(EC.visibility_of_element_located((By.XPATH, "//img")))
first_ad_size = first_ad.size
first_ad_w, first_ad_h = first_ad_size['width'], first_ad_size['height']

print(first_ad_w, first_ad_h)

total_page_area = total_width * total_height
print(total_page_area)

image_area = first_ad_w * first_ad_h
print(image_area)

percentage = (image_area * 100 )/total_page_area
print(percentage)

Imports :

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

PS : I have taken the first image as an ad (I know that's not ideal but just to give OP a way to implement this feature)

if you can locate all the ads with an common locator (xpath, css), it'd become more easy.

cruisepandey
  • 28,520
  • 6
  • 20
  • 38
  • Hi @cruisepandey the code does calculate the size of the whole webpage. However, it doesn't help with ad sizes. Ads are not always images. So instead of `By.XPATH`, I identified ad blocks using class names and did `By.CLASS`. I am hitting the error `'visibility_of_element_located' object has no attribute 'size'` – Nilima Jul 27 '21 at 13:08
  • Can you share the whole code which is causing the issue ? – cruisepandey Jul 27 '21 at 13:18
  • Can I just email you the code? @cruisepandey – Nilima Jul 27 '21 at 15:39
  • Or should I just put it here in comments? It is a fairly long code. – Nilima Jul 27 '21 at 15:48
  • You can email me also,no issue, try to share the code which is causing the issue. – cruisepandey Jul 27 '21 at 19:14
  • Thank you! Sent you the code. If you can help me with the issue, I'll just come back here and put it up a solution to this question. – Nilima Jul 28 '21 at 12:03
  • I could not understand your code well, but what is the ads HTML ? can you share that ? for this `I am hitting the error 'visibility_of_element_located' object has no attribute 'size'` you can change `visibility_of_all_elements_located` since this will return a list, there will be size so you can get rid off that – cruisepandey Jul 30 '21 at 11:46