2

I am trying to scrape data from here.

By clicking on the capital market and 2019-20 year. I want to click on Get data.

I have used following code:

driver = webdriver.Chrome(executable_path=chrome_path,options=chrome_options)

driver.get( nse_cash_keystats_page )


 driver.find_element_by_xpath( "//select[@id='h_filetype']/option[text()='Capital Market ']" ).click()

driver.find_element_by_xpath( "//select[@id='yearField']/option[text()='2019-2020']" ).click()

     downloadButton=WebDriverWait(driver,20).until(EC.element_to_be_clickable((By.XPATH,'//input[@type="image"][@src="/common/images/btn-get-data.gif"]')))

driver.execute_script("arguments[0].click();", downloadButton)

By using the above code, I am able to click on Get DATA. But it is not showing output.

Please help me.Thanks in advance.

saeed foroughi
  • 1,662
  • 1
  • 13
  • 25
Prats
  • 53
  • 1
  • 8

2 Answers2

2

I took your code added a few tweaks and ran the test as follows:

  • Code Block:

    from selenium import webdriver
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.support import expected_conditions as EC
    
    options = webdriver.ChromeOptions() 
    options.add_argument("start-maximized")
    options.add_experimental_option("excludeSwitches", ["enable-automation"])
    options.add_experimental_option('useAutomationExtension', False)
    driver = webdriver.Chrome(options=options, executable_path=r'C:\Utility\BrowserDrivers\chromedriver.exe')
    driver.get('https://www1.nseindia.com/products/content/equities/equities/eq_monthly_statistics.htm')
    Select(WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR,"select#h_filetype")))).select_by_visible_text("Capital Market ")
    Select(WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR,"select#yearField")))).select_by_visible_text("2019-2020")
    WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "input.getdata-button#get[type='image'][src^='/common/images/btn-get-data.gif']"))).click()
    

Observation

Similar to your observation, I have hit the same roadblock with no results as follows:

nseindia_Monthly_Statement


Deep Dive

It seems the click() on the element with text as Get Data does happens. But while inspecting the DOM Tree of the webpage you will find that some of the <script> tag refers to JavaScripts having keyword akam. As an example:

  • <script type="text/javascript" src="https://www1.nseindia.com/akam/11/52349752" defer=""></script>
  • <noscript><img src="https://www1.nseindia.com/akam/11/pixel_52349752?a=dD01ZDZiMTA5OGQ0MDljYTYxN2RjMjc3MzBlN2YwMDQ0NjlkZDNiNTMzJmpzPW9mZg==" style="visibility: hidden; position: absolute; left: -999px; top: -999px;" /></noscript>

Which is a clear indication that the website is protected by Bot Manager an advanced bot detection service provided by Akamai and the response gets blocked.


Bot Manager

As per the article Bot Manager - Foundations:

akamai_detection


Conclusion

So it can be concluded that the request for the data is detected as being performed by Selenium driven WebDriver instance and the response is blocked.


References

A couple of documentations:


tl; dr

A couple of relevant discussions:

undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
-1

I was also facing the similar problem with selenium and python. So I gave puppeteer a try if its selenium specific problem. even with raw puppeteer it was showing the same problem. i click get data button and blank box appear with console showing unauthorized access error. i knew akamai was causing this problem but qs. was how to fight that. so here is puppeteer script that i use to options data and bypass akamai. change it to suit your needs.

const puppeteer = require('puppeteer-extra')
const StealthPlugin = require('puppeteer-extra-plugin-stealth')
puppeteer.use(StealthPlugin())

async function nse(dates) {
    let launchOptions = {
        headless: false,
        executablePath: 'C:/Program Files (x86)/Google/Chrome/Application/chrome.exe', // because we are using puppeteer-core so we must define this option
        args: ['--start-maximized']
    };
    console.log(dates)
    const browser = await puppeteer.launch(launchOptions);
    const page = await browser.newPage();

    // set viewport and user agent (just in case for nice viewing)
    await page.setViewport({ width: 1366, height: 768 });
    await page.setUserAgent('Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36');

    // go to the target web
    await page.goto('https://www1.nseindia.com/products/content/derivatives/equities/historical_fo.htm');
    await page.select('#instrumentType', 'OPTIDX')
    await page.select('#symbol', 'NIFTY')
    await page.select('#year', '2020')
    await page.select('#expiryDate', `${dates[0]}`)
    await page.select('#optionType', 'CE')
    await page.waitForSelector('#rdDateToDate');
    let button = 'input[id="rdDateToDate"]';
    await page.evaluate((button) => document.querySelector(button).click(), button);
    // await page.select('#dateRange', '3month')
    await page.waitForSelector('#fromDate');
    await page.$eval('#fromDate', (el,date1) => el.value = date1, dates[1]);
    await page.waitForSelector('#toDate');
    await page.$eval('#toDate', (el,date2) => el.value = date2,dates[2]);
    let selector = 'input[class="getdata-button"]';
    await page.evaluate((selector) => document.querySelector(selector).click(), selector);
    await page.waitForFunction("document.querySelector('.download-data-link') && document.querySelector('.download-data-link').clientHeight != 0");
    const btnNext = await page.$('.download-data-link');
    await page.waitFor(3000);
    await btnNext.click();
    await browser.close()
}


async function input_dates(input) {
    var dates = []
    for (let i = 0; i < input[0].length; i++) {
        dates.push([input[0][i], input[1][i], input[2][i]])

    }
    return dates

}
var data = [['30-01-2020', '27-02-2020', '26-03-2020', '30-04-2020', '28-05-2020'],//expiry date array
['01-Jan-2020', '01-Feb-2020', '01-Mar-2020', '01-Apr-2020', '01-May-2020'],//start date array
['30-Jan-2020', '27-Feb-2020', '26-Mar-2020', '30-Apr-2020', '28-May-2020']]//end date array

input_dates(data).then(async dates => {
    dates.forEach(async el =>{
        nse(el)
    } )
}).catch(err=>console.log(err))

this downloads the data in your default download directory.