1

I would like to read data from nse-india.com to R using download.file() as shown below.

url = 'http://www.nseindia.com/content/historical/EQUITIES/2014/SEP/cm24SEP2014bhav.csv.zip'
temp = tempfile()
download.file(url, destfile = temp,method = 'wget')

It throws up following error:

SYSTEM_WGETRC = c:/progra~1/wget/etc/wgetrc
syswgetrc = C:\PROGRA~2\GnuWin32/etc/wgetrc
--2014-09-28 21:19:26--  http://www.nseindia.com/content/historical/EQUITIES/2014/SEP/cm24SEP2014bhav.csv.zip
Resolving www.nseindia.com... 202.83.22.200, 202.83.22.203
Connecting to www.nseindia.com|202.83.22.200|:80... connected.
HTTP request sent, awaiting response... 403 Forbidden
2014-09-28 21:19:26 ERROR 403: Forbidden.

Warning messages:
1: running command 'wget  "http://www.nseindia.com/content/historical/EQUITIES/2014/SEP/cm24SEP2014bhav.csv.zip" -O "C:\Users\ITITHI~1\AppData\Local\Temp\Rtmp2fjADx\file1fb02375882"' had status 1 
2: In download.file(url, destfile = temp, method = "wget") :
  download had nonzero exit status

Please let me know anyway to fix this.

EDIT: Or any other method to download the file from within R would also be great.

kay dee
  • 53
  • 9

2 Answers2

1

You need to set a browser-like user agent string so the site thinks you're a browser vs an automated scraper/downloader robot:

library(httr) # >=v0.5

GET("http://www.nseindia.com/content/historical/EQUITIES/2014/SEP/cm24SEP2014bhav.csv.zip",
    user_agent("Mozilla/5.0"), write_disk("cm24SEP2014bhav.csv.zip"))

## Response [http://www.nseindia.com/content/historical/EQUITIES/2014/SEP/cm24SEP2014bhav.csv.zip]
##   Date: 2014-09-28 23:53
##   Status: 200
##   Content-type: application/zip
##   Size: 58.2 kB
## <ON DISK>  cm24SEP2014bhav.csv.zip
hrbrmstr
  • 77,368
  • 11
  • 139
  • 205
  • Would it be possible without `httr` library? – kay dee Sep 29 '14 at 15:53
  • 1
    Aye. Look at the source for `GET`, `user_agent` and `write_disk`. Most `httr` calls are just wrappers for `RCurl` (it uses `RCurl` under the covers for pretty much everything). Just type `GET` in the console or poke around at the [github repo](https://github.com/hadley/httr) – hrbrmstr Sep 29 '14 at 18:44
0

You need permission to access that site. Here is the message (in doc) from the httr package:

url = 'http://www.nseindia.com/content/historical/EQUITIES/2014/SEP/cm24SEP2014bhav.csv.zip'
doc <- content(GET(url))


<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html>
<head><title>Access Denied</title></head>
<body>
<h1>Access Denied</h1>

You don't have permission to access "http://www.nseindia.com/content/historical/EQUITIES/2014/SEP/cm24SEP2014bhav.csv.zip" on this server.<p>
Reference #18.df24317.1411924047.3b4f02a1
</p>
</body>
</html>
lawyeR
  • 7,488
  • 5
  • 33
  • 63
  • How can it be? I can download the file in browser, also by DownThemAll !! – kay dee Sep 28 '14 at 17:19
  • Do you have to log in to the website on your browser to download the data? Anyway, if you can download the file, why not just save it and load it into R from your local copy? – rmccloskey Sep 28 '14 at 17:40
  • The site needs no log-in info. I need to use that in a function, also the link can be constructed if using from R. – kay dee Sep 29 '14 at 02:21