1

I use python (3.4) and Selenium in order to load a webpage and: first, get all the elements; second, create a list with only elements that are visible. This is my code:

driver = webdriver.Chrome()
driver.maximize_window()
url = "https://www.gazzetta.it/"
driver.get(url)
all_elems = driver.find_elements_by_xpath("//*")
start = datetime.now()
print("Start:  {}".format(start))


visible_elems = []
for elem in all_elems:
    if elem.is_displayed():
        visible_elems.append(elem)

end = datetime.now()
print("End:  {}".format(end))

diff = end - start
print("Diff =  {}".format(diff))    

my problem is that the loop takes forever (on my end, it takes about 1 minute and 20 seconds). I read similar question (Detect user visible elements(only in viewport) by xpath in selenium, Python , How to create a list of all visible elements in a class python ) but none of them seems to address this specific problem. I know you may wonder why I need all the elements, long story short I upload all the elements inside a dataframe for further analysis. Can somebody think about a way to speed this up? Thank you

Angelo
  • 1,594
  • 5
  • 17
  • 50
  • `I upload all the elements inside a dataframe for further analysis.` Then just grab the HTML of the dataframe and store it off... it should be really quick. Touching every element on the page twice isn't going to be fast except on the simplest of pages. I don't know what you are trying to do but there likely is a whole different approach that is way better (and faster). What kind of "further analysis" are you doing where you need every visible element on the page? – JeffC Oct 16 '18 at 18:50
  • @JeffC thank you Jeff. Please when you say - "Then just grab the HTML of the dataframe and store it off" - could you share an example? – Angelo Oct 16 '18 at 19:34
  • You do something like `driver.find_element_by_id("the id of the dataframe").get_attribute("innerHTML")` and store that in a string, dump it to a file, etc. – JeffC Oct 16 '18 at 19:36
  • @JeffC apologizes for my ignorance Jeff, but I don't understand what dataframe could have all the visible elements. I'm trying to use regex applied to driver.page_source but I hoped somebody would have offered a more simple solution. Thanks anyway – Angelo Oct 16 '18 at 19:40
  • You were the one that referenced "dataframe" and getting the contents. I was just proposing an alternative. You still haven't explained what you plan to do with all this data... what analysis you plan to do. – JeffC Oct 16 '18 at 19:42
  • The dataframe is the output of an analysis that is not related to this question. I think my question is quite clear. I simply need a list with all the visible web elements and "driver.find_element_by_id("the id of the dataframe")" doesn't look like selenium to me. Anyway, I'll give up. Thanks. – Angelo Oct 16 '18 at 19:49
  • I misunderstood... I thought you meant there was a dataframe (wasn't sure what that meant in this context) on the page you were scraping that you wanted all the visible elements from. You still haven't explained why you are trying to grab every visible element on the page. – JeffC Oct 16 '18 at 19:53
  • @JeffC sure, I need to find and use a lot of elements that keep changing their attributes. Both absolute and relative xpath don't quite help since I always waste a lot of time to find all the elements I need. I use explicit wait but still, sometimes to find the elements I need can waste more than 1 minute. Hence, I was trying to get all the visible elements, then get their location/size/text features, store that info in a dataframe and based on my need query the dataframe to find the elements needed. – Angelo Oct 16 '18 at 19:57

1 Answers1

1

Here is a dummy test i did on google.com 10 loops over findElements(by.xpath("//*") and mark if element is displayed.

Found 88 elements
Duration: 00:00:07.001
Found 88 elements
Duration: 00:00:03.952
Found 88 elements
Duration: 00:00:02.740
Found 88 elements
Duration: 00:00:02.579
Found 88 elements
Duration: 00:00:02.566
Found 88 elements
Duration: 00:00:02.532
Found 88 elements
Duration: 00:00:02.694
Found 88 elements
Duration: 00:00:02.554
Found 88 elements
Duration: 00:00:02.419
Found 88 elements
Duration: 00:00:02.436

I don't see any problem with the results.

Note: driver implicit time affects, findElement(s) and the other methods. by default as i remember is 500ms, try manually changing it.

Infern0
  • 2,565
  • 1
  • 8
  • 21
  • thanks for the answer, but what if you try on the page I mentioned above? www.gazzetta.it – Angelo Oct 16 '18 at 17:33
  • Total elements 1794 Stale elements39 Found 1044 elements Duration: 00:00:33.393 – Infern0 Oct 16 '18 at 17:47
  • uhm, to be honest I'm not sure how your solution could help me so far. You got "1794 Stale elements", so it seems to me there's something wrong in your code. – Angelo Oct 16 '18 at 17:58
  • to be clear, total items: 1794 ..... Stale elements: 39.... displayed: 1044 elements. You can try to wrap in try catch the if (el.isDisplayed) and ignore the stale element exception – Infern0 Oct 16 '18 at 18:21
  • could you please the exact python code you're using please? – Angelo Oct 16 '18 at 18:35