-2

I'm trying to scrape data from a site with multiple pages linked via a NEXT button

The successive page URL has no correspondence with the previous page URL as one might assume

(In that case modifying the path would've solved the problem)

This is what I plan to do -

1.Start with an initial URL

2.Extract information

3.Click NEXT

Repeat 2 and 3 n times

Specifically, I wanted to know how to get the new page URL on clicking

This is what I've come up with so far

def startWebDriver():
    global driver
    options = Options()
    options.add_argument("--disable-extensions")
    driver = webdriver.Chrome(executable_path = '/path/to/driver/chromedriver_linux64/chromedriver',options=options)

#URL of the initial page
driver.get('https://openi.nlm.nih.gov/detailedresult.php?img=CXR1_1_IM-0001-3001&query=&coll=cxr&req=4&npos=1')

time.sleep(4)

#XPATH of the "NEXT" button
element = driver.find_element_by_xpath('//*[@id="imageClassM"]/div/a[2]/img').click()

Any help would be appreciated

  • I'm a bit unclear about what you're trying to achieve here. Would this be the correct synopsis: you've opened the URL, located the "NEXT" button on it, and clicked it, and now you'd like to know which URL the page has redirected to? – Anuj Khandelwal Feb 21 '19 at 17:47
  • As per your button `xpath` is should be `>` button.However I can't see any `>` button on webpage you have provided.Is it right `url` are you navigating? – KunduK Feb 21 '19 at 18:47
  • The URL I've provided is the right one.. the XPATH is also right .. but when you visit that page(even manually) ... that element is not visible for some reason @Anuj Khandelwal – Abhishek Rajbhoj Feb 22 '19 at 05:29
  • Yes, that's because its CSS style is set to "display: none". When we remove that style property from the console, the button appears, but clicking it does not lead to any new page. Are you sure that button is functional? – Anuj Khandelwal Feb 22 '19 at 05:32

3 Answers3

0

If you would like to get the url of the page you are on after clicking next try this.

print(browser.current_url)

or

print(driver.current_url)
Julian Silvestri
  • 1,970
  • 1
  • 15
  • 33
0

Perhaps you could try something like this:

from selenium import webdriver
from selenium.webdriver import ChromeOptions
import time

if __name__ == "__main__":
    options = ChromeOptions()
    options.add_argument("--disable-extensions")
    #start driver
    driver = webdriver.Chrome(options=options)
    #load first page
    driver.get('https://openi.nlm.nih.gov/detailedresult.php?img=CXR1_1_IM-0001-3001&query=&coll=cxr&req=4&npos=1')
    for i in range(3): #However many of these links to click
        time.sleep(4) # let each page load
        driver.find_element_by_xpath('//*[@id="imageClassM"]/div/a[2]/img').click()
        print(driver.current_url)

This loads the page for me (I removed your bit about chrome driver path because my driver is in the same folder). It does get an error though, and looks like it's mad at driver.find_element_by_xpath('//*[@id="imageClassM"]/div/a[2]/img').click() saying:

selenium.common.exceptions.ElementNotVisibleException: Message: element not visible

I'm not sure how to fix that because I see no "NEXT" button on the webpage... I'm sure you can figure it out though!

Reedinationer
  • 5,661
  • 1
  • 12
  • 33
0
driver.current_url()

You may need to do a wait first for the page to load.

undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
user2016569
  • 26
  • 1
  • 8