0

I am trying to get the current url of an item which is already in a loop

def get_financial_info(self):
    chrome_options = Options()
    chrome_options.add_argument("--headless")
    chrome_options.add_argument("--window-size=1920x1080")
    driver = webdriver.Chrome(executable_path='/path/chromedriver')

    driver.get("https://www.financialjuice.com")

    try:
        WebDriverWait(driver, 60).until(EC.visibility_of_element_located((By.XPATH, "//div[@class='trendWrap']")))
    except TimeoutException:
        driver.quit()

    category_url = [a.get_attribute("href") for a in
                    driver.find_elements_by_xpath("//ul[@class='nav navbar-nav']/li[@class='text-uppercase']/a[@href]")]

    for record in category_url:
        driver.get(record)
        item = {}
        url_element = webdriver.find_elements_by_xpath("//p[@class='headline-title']/a[@href]")

        for links in url_element:
            driver.get(links.get_attribute("href"))
            print driver.current_url

but i got the first actual link but the code stopped,

http://www.zerohedge.com/news/2017-08-26/brief-history-tail-risk-ltcm-abx-cds-vix?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+zerohedge%2Ffeed+%28zero+hedge+-+on+a+long+enough+timeline%2C+the+survival+rate+for+everyone+drops+to+zero%29

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: element is not attached to the page document
  (Session info: headless chrome=62.0.3202.94)
  (Driver info: chromedriver=2.33.506092 (733a02544d189eeb751fe0d7ddca79a0ee28cce4),platform=Linux 4.4.0-101-generic x86_64)

i tried studying what happened, i realized that, the webdriver opened the first category, choose the first item and got the actual link and it stopped instead of going back to the previous url, take the second item and get the next link, till the loop ends.

undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
molecules
  • 21
  • 2
  • 12

1 Answers1

1

You should implement the same approach for inner for loop as you used for the outer one. Replace

url_element = webdriver.find_elements_by_xpath("//p[@class='headline-title']/a[@href]")

for links in url_element:
    driver.get(links.get_attribute("href"))
    print driver.current_url

with

url_element = [a.get_attribute('href') for a in webdriver.find_elements_by_xpath("//p[@class='headline-title']/a")]
for link in url_element:
    driver.get(link)
    print driver.current_url
Andersson
  • 51,635
  • 17
  • 77
  • 129
  • this worked a bit but keeps timing out, what can i do to avoid that – molecules Nov 29 '17 at 14:58
  • 1
    @molecules can you clarify what you mean by *timing out*? Where does it time out, and what is the relevant part of the exception? – mrfreester Nov 29 '17 at 15:53
  • i still got unsupported protocol and @mrfreester when the code displays like 10 items, i get time out error – molecules Nov 30 '17 at 06:52
  • Share the **exact exception log** – Andersson Nov 30 '17 at 06:57
  • @Andersson selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: element is not attached to the page document (Session info: headless chrome=62.0.3202.94) (Driver info: chromedriver=2.33.506092 (733a02544d189eeb751fe0d7ddca79a0ee28cce4),platform=Linux 4.4.0-101-generic x86_64) I added the code to gist https://gist.github.com/iammiracle/e51e5acf9f3659d84914f5e3a39b27d0 – molecules Nov 30 '17 at 07:21
  • Hm... This doesn't look like `TimeOutException` or `UnsupportedProtocol`... Can you share **current** exception? – Andersson Nov 30 '17 at 07:23
  • @Andersson, here is the current issue: selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: element is not attached to the page document (Session info: headless chrome=62.0.3202.94) (Driver info: chromedriver=2.33.506092 (733a02544d189eeb751fe0d7ddca79a0ee28cce4),platform=Linux 4.4.0-101-generic x86_64) The timeout, i guess its my network issue, so you can ignore that – molecules Nov 30 '17 at 07:26
  • 1
    As per provided link your code contains more lines and thus more reasons for errors. Code provided in current SO ticket doesn't contain those lines and so my code can only solve the current problem... I think you should open new ticket regarding new issues – Andersson Nov 30 '17 at 08:27
  • @Andersson here is the extension of the question: https://stackoverflow.com/questions/47569370/error-shown-when-geting-current-url-of-an-item-in-a-loop-after-using-selenium – molecules Nov 30 '17 at 08:58