0

I am trying to download the daily report from the website NSE-India using selenium & python.

Approach to download the daily report

  • Website loads with no data
  • After X time,page is loaded with report information
  • Once the page is loaded with report data,"table[@id='etfTable']" appears
  • Explicit wait is added in the code,to wait till the "table[@id='etfTable']" loads

Code for explicit wait

element=WebDriverWait(driver,50).until(EC.visibility_of_element_located(By.xpath,"//table[@id='etfTable']"))

  • Extract the onclick event using xpath

    downloadcsv= driver.find_element_by_xpath("//div[@id='esw-etf']/div[2]/div/div[3]/div/ul/li/a")

  • Trigger the click to download the file

Full code

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

options =webdriver.ChromeOptions();
prefs={"download.default_directory":"/Volumes/Project/WebScraper/downloadData"};
options.binary_location=r'/Applications/Google Chrome 2.app/Contents/MacOS/Google Chrome'
chrome_driver_binary =r'/usr/local/Caskroom/chromedriver/94.0.4606.61/chromedriver'
options.add_experimental_option("prefs",prefs)

driver =webdriver.Chrome(chrome_driver_binary,options=options)

try:
  #driver.implicity_wait(10)
  driver.get('https://www.nseindia.com/market-data/exchange-traded-funds-etf')
  element =WebDriverWait(driver,50).until(EC.visibility_of_element_located(By.xpath,"//table[@id='etfTable']"))
  downloadcsv= driver.find_element_by_xpath("//div[@id='esw-etf']/div[2]/div/div[3]/div/ul/li/a")
  print(downloadcsv)
  downloadcsv.click()
  time.sleep(5)
  driver.close()
except:
  print("Invalid URL")

Issue i am facing.

  • The page is keeps on loading but when launched without selenium the daily report is getting loaded

Normal Loading via Selenium

  • Not able to download the daily report
  • 1
    As mentioned by @Darkknight/@pmadhu,site had some bot detection in place which was causing the "403" response.Able to bypass the bot detection with the help of **undetected_chromedriver**For more information [https://stackoverflow.com/questions/65529808/undetected-chromedriver-not-loading-correctly] – Rove sprite Oct 16 '21 at 10:37

2 Answers2

0

There are some syntax error in the program. Like semi-colon in few lines and while finding element using WebDriverWait, brackets are missing.

Try like below and confirm.

Can use Javascript to click on that element.

driver.get("https://www.nseindia.com/market-data/exchange-traded-funds-etf")
element =WebDriverWait(driver,50).until(EC.visibility_of_element_located((By.XPATH,"//table[@id='etfTable']/tbody/tr[2]")))


downloadcsv= driver.find_element_by_xpath("//img[@title='csv']/parent::a")
print(downloadcsv)
driver.execute_script("arguments[0].click();",downloadcsv)
pmadhu
  • 3,373
  • 2
  • 11
  • 23
  • I updated the code(Removed the semi-colon in few line),but page is not getting loaded with the data.Since i can't attach screen shot in the comment,I am updating the question with screen shot information – Rove sprite Oct 13 '21 at 06:28
  • @Rovesprite - Have updated the code. Apply wait such that some rows are present in the table. And make sure you have good internet speed. I was able to see the table loaded with data. – pmadhu Oct 13 '21 at 06:50
  • it's nice solution to wait for the rows in the table rather than id,but still i am not able to download the file.The problem is it's intermediate sometimes the data loads up but most of the time it's page is loaded. Are using headless browser option while executing the code. ? – Rove sprite Oct 13 '21 at 13:45
  • @Rovesprite - I am not applying any options. Was able to click on the element few hours back. But Now I am also not able to click on that element. Getting access denied message when refreshed. May be the website is not allowed to automate. – pmadhu Oct 13 '21 at 14:40
0

It's not an issue with your code it's an issue with the website. I checked it most of the time it did not allow me to click on the CSV file. instead of downloading the CSV file, you can scrape the table.

# for direct to the page delete cookies is very important otherwise it will deny the access

browser.delete_all_cookies()
browser.get('https://www.nseindia.com/market-data/exchange-traded-funds-etf')
sleep(5)

soup = BeautifulSoup(browser.page_source, 'html.parser')
# scrape the table from the soup
Darkknight
  • 1,716
  • 10
  • 23
  • i tired option suggestion suggested by you, but table information is not getting loaded. I checked the network tab,GET method to load the data is returning "403".Need to figure out why it's throwing "403" – Rove sprite Oct 15 '21 at 14:14