1

I am trying to download a .csv file from https://www.bseindia.com/markets/equity/EQReports/MarketWatch.aspx by using R but it is not working.

I am posting screenshot with location of file that I am trying to download.

enter image description here

when I click on the image of the .csv file a file is downloading with the name "MarketWatch_14_00_2018.csv"

My goal is to read the file into R so I used the below command

MARKET_WATCH <- read.csv("MarketWatch_15_00_2018.csv", stringsAsFactors = F)

this worked fine but I wanted to automate this process that is read the file "MarketWatch_15_00_2018.csv" directly from web without clicking and downloading it manually, so I used the following command to achieve this task.

MARKET_WATCH_TEST <- read.csv("https://www.bseindia.com/markets/Equity/EQReports/MarketWatch.aspx?expandable=2/MarketWatch_17_00_2018.csv")

this command gave no errors but the data that was loaded into dataframe was not correct, it had some HTML code that got loaded into dataframe

So I tried downloading the file first, so that I can load it later, I used below command to download the file

downld <- getURL("https://www.bseindia.com/markets/Equity/EQReports/MarketWatch.aspx?expandable=2/MarketWatch_17_00_2018.csv? accessType=DOWNLOAD")

DATA <- read.csv (text = downld)

I checked the data the same HTML code is copied in both the dataframes this time i.e the file didn't load at all just the html text loaded into dataframe

I tried couple of other ways like using fread and getURL etc but none of that worked. Code that I used for loading the data is mentioned below.

dwnld <- fread("https://www.bseindia.com/markets/Equity/EQReports/MarketWatch.aspx?expandable=2/MarketWatch_17_00_2018.csv")

URL <- "https://www.bseindia.com/markets/equity/EQReports/MarketWatch.aspx/MarketWatch_17_00_2018.csv"

X <- getURL(URL)

Can someone please help me understand why the file is not properly loading into R environment when I am trying to load it directly from web where as it is loading fine after downloading the file to my local desktop.

Dave2e
  • 22,192
  • 18
  • 42
  • 50
Vignan
  • 43
  • 3
  • 2
    Welcome to SO. Pictures are neither code nor data unless the topic is image processing. The R tag page has helpful info on how to post a good question. You didn't read it. You've likely seen other, good questions. This is definitely not a good question. Your goal is to supply code & data, explain what isn't working, document _all the effort you put in_ to try to solve it & then ask for help. This is not a code-writing service & you've shown no effort whatsoever + you tag-spammed, somewhat proving you just want someone to do this for you. Which of the plethora of SO web scraping Qs didn't work? – hrbrmstr Nov 17 '18 at 11:36
  • I am not going to give you the code because this is a lot of work, but if you open inspect and press the button to download the csv, you are going to see that it does by another http resquest to the same URL using POST with parameters that you grab by parsing the page from the first request. Good luck. – José Nov 17 '18 at 14:26
  • Hi Jose, i tried this approach too but the problem is there is no URL in the page source that is linked to this particular .csv file. please have a look at the inspect element of that particular file image – Vignan Nov 17 '18 at 15:30
  • Check this : https://stackoverflow.com/a/74493075/2444948 – Dorian Grv Nov 18 '22 at 17:51

1 Answers1

1

How about this?

library(dplyr)
library("rvest")
url <- "https://www.bseindia.com/markets/equity/EQReports/MarketWatch.aspx"
data <- url %>%
  read_html() %>%
  html_table(fill = TRUE)

df <- data[[9]]

df <- df[, -c(11:21)]

EDIT: I do see that this webpage has some Java Script links:

__doPostBack('ctl00$ContentPlaceHolder1$grd1','Page$2')
__doPostBack('ctl00$ContentPlaceHolder1$grd1','Page$3')
__doPostBack('ctl00$ContentPlaceHolder1$grd1','Page$4')
...

at the bottom of the table, where I have only imported the first page of results.

user113156
  • 6,761
  • 5
  • 35
  • 81
  • i want to download the .csv file rather than reading the table from HTML. URL <- "https://www.bseindia.com/markets/equity/EQReports/MarketWatch.aspx" download.file(URL, "MarketWatch_17_00_2018.csv") data.url <- read.table("MarketWatch_17_00_2018.csv", sep=",", header=TRUE) i can see that data.url contains HTML code from that page. so i think instead of downloading the target file it is just downloading the HTML code. Now i want to understand how to download the file instead of HTML code from page – Vignan Nov 17 '18 at 13:15