0

I am trying to download the past data from a website areavolume.

I am using rvest function html_form_set() to fill the form with the drop down select like interval = 15-minute-block, delivary = last 31days, type = both, Area = Mark All. snapshot required fill area past data of last 31 days. I have seen the solution from the site stack_site_1 and site site_with_httr. Snapshot for date range selection .

library(rvest)
library(httr)
library(tidyverse)

pg <- html_session('https://www.iexindia.com/marketdata/rtm_areavolume.aspx')
form.unfilled <- pg %>% html_node("form") %>% html_form()
form.filled <- form.unfilled %>% html_form_set("ctl00$InnerContent$ddlInterval" = "1", "ctl00$InnerContent$ddlPeriod" = "-31", 'ctl00$InnerContent$ddlType' = '1')
session <- session_submit(pg, form.filled) 
table <- session %>% html_nodes("table")
vol_table <- html_table(table, fill=TRUE)

### another way selecting the date range
iex_html = 'https://www.iexindia.com/marketdata/rtm_areavolume.aspx'
iex_ses <- html_session(iex_html)
iex_form <- iex_ses %>% html_node("form") %>% html_form()

iex_fill <- iex_form %>% html_form_set("ctl00$InnerContent$ddlInterval" = "1", "ctl00$InnerContent$ddlPeriod" = "SR", "ctl00$InnerContent$calFromDate$txt_Date" = "01/03/2021", "ctl00$InnerContent$calToDate$txt_Date" = '03/03/2021', 'ctl00$InnerContent$ddlType' = '1')
iex_form$fields$`ctl00$InnerContent$btnUpdateReport`$type <- 'submit' 
out <- session_submit(x = iex_ses, form = iex_fill)
out_table <- out %>% html_nodes("table")
out_table1 <- html_table(out_table, fill=TRUE)

###with httr
vol_htr <- POST("https://www.iexindia.com/marketdata/rtm_areavolume.aspx", body = list('ctl00$InnerContent$ddlInterval' = "ctl00$InnerContent$ddlInterval:1", 'ctl00$InnerContent$ddlPeriod' = "-31", 'ctl00$InnerContent$ddlType' = "1", 'ctl00$InnerContent$btnUpdateReport' = "Update Report"), encode = "form")
vol_httr_table <- read_html(vol_htr) %>% html_table(fill=TRUE)

It all shows the data table of present/current day data. I am sure that I am doing something wrong with submitting the 'update reports' May be my doubt with the selection of checkbox.

1 Answers1

0

A RSelenium solution to download the excel file

#Start the server 
library(RSelenium)
driver = rsDriver(browser = c("chrome"))
remDr <- driver[["client"]]

#Navigate to website
remDr$navigate("https://www.iexindia.com/marketdata/rtm_areavolume.aspx")

#Download the Excel file 
button_element <- remDr$findElement(using ="xpath", value = '//*[@id="ctl00_InnerContent_reportViewer_ctl05_ctl04_ctl00_ButtonImg"]')
button_element$clickElement()
button_element <- remDr$findElement(using ="xpath", value = '//*[@id="ctl00_InnerContent_reportViewer_ctl05_ctl04_ctl00_Menu"]/div[1]/a')
button_element$clickElement()
Nad Pat
  • 3,129
  • 3
  • 10
  • 20
  • 1
    But Rselenium is not that efficient and its very slow. I could easily get this with Rvest or httr with much faster way. I am not able to download the historical data of last 31 Days. Can you try once with Rvest or httr package. – Avijit Mallick Sep 14 '21 at 04:03
  • `rvest` and `httr` are not suitable for JavaScript websites. RSelenium is efficient. Refer https://stackoverflow.com/questions/63568086/rvest-and-sites-with-javascript https://stackoverflow.com/questions/26631511/scraping-javascript-website-in-r – Nad Pat Sep 16 '21 at 16:55