Unable to access the remaining elements by xpaths in a loop after accessing the first element- Webscraping Selenium Python

Question

Im trying to scrape data from sciencedirect website. Im trying to automate the scraping process by accessing the journal issues one after the other by creating a list of xpaths and looping them. when im running the loop im unable to access the rest of the elements after accessing the first journal. This process worked for me on another website but not on this.

I also wanted to know is there any better way to access these elements apart from this process.

#Importing libraries
 import requests
 import os
 import json
 from selenium import webdriver
 import pandas as pd
 from bs4 import BeautifulSoup  
 import time
 import requests
 from time import sleep

 from selenium.webdriver.common.by import By
 from selenium.webdriver.support.ui import WebDriverWait
 from selenium.webdriver.support import expected_conditions as EC

 #initializing the chromewebdriver|
 driver=webdriver.Chrome(executable_path=r"C:/selenium/chromedriver.exe")

 #website to be accessed
 driver.get("https://www.sciencedirect.com/journal/journal-of-corporate-finance/issues")

 #generating the list of xpaths to be accessed one after the other
 issues=[]
 for i in range(0,20):
     docs=(str(i))
     for j in range(1,7):
         sets=(str(j))
         con=("//*[@id=")+('"')+("0-accordion-panel-")+(docs)+('"')+("]/section/div[")+(sets)+("]/a")
         issues.append(con)

 #looping to access one issue after the other
 for i in issues:
     try:
         hat=driver.find_element_by_xpath(i)
         hat.click()
         sleep(4)
         driver.back()
     except:
         print("no more issues",i)

https://meta.stackoverflow.com/q/303812/11301900. Can you share the relevant HTML as well as some of the constructed XPath queries? My guess is that you might not even need a loop. — AMC, Jan 12 '20 at 18:26
['//*[@id="0-accordion-panel-0"]/section/div[1]/a', '//*[@id="0-accordion-panel-0"]/section/div[2]/a', '//*[@id="0-accordion-panel-0"]/section/div[3]/a', '//*[@id="0-accordion-panel-0"]/section/div[4]/a', These are the xpaths i have created and please look up the html from https://www.sciencedirect.com/journal/journal-of-corporate-finance/issues , im unable to put up the html in the comments. Thanks — VenuBhaskar, Jan 12 '20 at 19:58
None of that should be in the comments anyway, you can just edit your post. Looking at those XPath queries, you should indeed be able to use a single one with `.find_elements_by_xpath()`. — AMC, Jan 12 '20 at 20:15

undetected Selenium · Accepted Answer · 2021-01-08T16:52:09.330

To scrape data from sciencedirect website https://www.sciencedirect.com/journal/journal-of-corporate-finance/issues you can perform the following steps:

First open all the accordions.
Then open each issue in the adjustant TAB using Ctrl + click().
Next switch_to() the newly opened tab and scrape the required contents.

Code Block:

  from selenium import webdriver
  from selenium.webdriver.common.by import By
  from selenium.webdriver.support.ui import WebDriverWait
  from selenium.webdriver.support import expected_conditions as EC
  from selenium.webdriver.common.action_chains import ActionChains
  from selenium.webdriver.common.keys import Keys

  options = webdriver.ChromeOptions() 
  options.add_argument("start-maximized")
  options.add_experimental_option("excludeSwitches", ["enable-automation"])
  options.add_experimental_option('useAutomationExtension', False)
  driver = webdriver.Chrome(options=options, executable_path=r'C:\Utility\BrowserDrivers\chromedriver.exe')
  driver.get('https://www.sciencedirect.com/journal/journal-of-corporate-finance/issues')
  accordions = WebDriverWait(driver, 30).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "li.accordion-panel.js-accordion-panel>button.accordion-panel-title>span")))
  for accordion in accordions:
      ActionChains(driver).move_to_element(accordion).click(accordion).perform()
  issues = WebDriverWait(driver, 30).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "a.anchor.js-issue-item-link.text-m span.anchor-text")))
  windows_before  = driver.current_window_handle
  for issue in issues:
      ActionChains(driver).key_down(Keys.CONTROL).click(issue).key_up(Keys.CONTROL).perform()
      WebDriverWait(driver, 10).until(EC.number_of_windows_to_be(2))
      windows_after = driver.window_handles
      new_window = [x for x in windows_after if x != windows_before][0]
      driver.switch_to_window(new_window)
      WebDriverWait(driver, 30).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "a#journal-title>span")))
      print(WebDriverWait(driver, 30).until(EC.visibility_of_element_located((By.XPATH, "//h2"))).get_attribute("innerHTML"))
      driver.close()
      driver.switch_to_window(windows_before)
  driver.quit()

Console Output:

  Institutions, Governance and Finance in a Globally Connected Environment
  Volume 58
  Corporate Governance in Multinational Enterprises
  .
  .
  .

References

You can find a couple of relevant detailed discussions in:

Unable to access the remaining elements by xpaths in a loop after accessing the first element- Webscraping Selenium Python

1 Answers1

References

Linked