0

I am trying to retrieve all the links to the posts of on instagram account. The structure is a bit nested: first I find the class by X_Path where all of those links are located and then I iterate over web_elements( posts) to extract the links. However, this approach throws the Stale Element Reference.

My question is: How should I design a loop with WebDriverWait implementation with By.CSS_Selector to extract links and store them in one list?

I've read and tried to implement the WebDriverWait, yet I am stuck doing that properly since all the attempts do not seem to work.

I've search for the questions and have found two links that were very helpful, however none of those deal with By.CSS_SELECTOR to extract a href.

These are the links: StaleElementException when iterating with Python

My current code that goes in infinite loop:

def getting_comment(instagram_page, xpath_to_links, xpath_to_comments  ):
global allComments
links = []
scheight = .1
posts = []
browser= webdriver.Chrome('/Users/marialavrovskaa/desktop/chromedriver')
browser.get(f"{instagram_page}")
while scheight < 9.9:
    browser.execute_script("window.scrollTo(0, document.body.scrollHeight/%s);" % scheight)
    scheight += .01
    posts = browser.find_elements_by_xpath(f"//div[@class='{xpath_to_links}']")

    for elem in posts: 
        while True: 
            try: 
                WebDriverWait(elem, 20).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, ".a")))
                links.append(elem.find_element_by_css_selector('a').get_attribute('href'))
            except TimeoutException: 
                break

instagram_page = https://www.instagram.com/titovby/?hl=ru

xpath_to_links = v1Nh3 kIKUG _bz0w

Karina
  • 29
  • 6
  • this happens if for any reason after findElement returned WebElement your page is refreshed. – Amit Jain Mar 30 '20 at 13:59
  • Well, or the document is not found. I am more confused if the implemented code is correct, since I ran out of any ideas how I can enhance it. – Karina Mar 30 '20 at 14:03
  • posts variable is filled with WebElement by scrolling page, so I guess after each scroll page is refreshed making earlier element as stale. – Amit Jain Mar 30 '20 at 14:15
  • The infinite loop is due to while True: , but it does not seem to be the issue if just removing it. However, this WebDriverWait(elem, 10).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, "a > href"))) causes the loop to break, could you give me an advice how I can influence it – Karina Mar 30 '20 at 14:26

0 Answers0