Access a URL and read Data with R

Question

Is there a way I can specify and get data from a web site URL on to a CSV file for analysis using R?

Dirk Eddelbuettel · Answer 1 · 2020-09-26T15:34:27.920

In the simplest case, just do

X <- read.csv(url("http://some.where.net/data/foo.csv"))

plus which ever options read.csv() may need.

Edit in Sep 2020 or 9 years later:

For a few years now R also supports directly passing the URL to read.csv:

X <- read.csv("http://some.where.net/data/foo.csv")

End of 2020 edit. Original post continutes.

Long answer: Yes this can be done and many packages have use that feature for years. E.g. the tseries packages uses exactly this feature to download stock prices from Yahoo! for almost a decade:

R> library(tseries)
Loading required package: quadprog
Loading required package: zoo

    ‘tseries’ version: 0.10-24

    ‘tseries’ is a package for time series analysis and computational finance.

    See ‘library(help="tseries")’ for details.

R> get.hist.quote("IBM")
trying URL 'http://chart.yahoo.com/table.csv?    ## manual linebreak here
  s=IBM&a=0&b=02&c=1991&d=5&e=08&f=2011&g=d&q=q&y=0&z=IBM&x=.csv'
Content type 'text/csv' length unknown
opened URL
.......... .......... .......... .......... ..........
.......... .......... .......... .......... ..........
.......... .......... .......... .......... ..........
.......... .......... .......... .......... ..........
.......... .......... .......... .......... ..........
........
downloaded 258 Kb

             Open   High    Low  Close
1991-01-02 112.87 113.75 112.12 112.12
1991-01-03 112.37 113.87 112.25 112.50
1991-01-04 112.75 113.00 111.87 112.12
1991-01-07 111.37 111.87 110.00 110.25
1991-01-08 110.37 110.37 108.75 109.00
1991-01-09 109.75 110.75 106.75 106.87
[...]

This is all exceedingly well documented in the manual pages for help(connection) and help(url). Also see the manul on 'Data Import/Export' that came with R.

In the simplest case, just do `X <- read.csv(url("http://some.where.net/data/foo.csv"))` plus which ever options read.csv() may need. — psychonomics, Aug 15 '15 at 08:46
I tried this but it did not work. Anyone can help? ```cv_today <- read.csv(url("https://github.com/eparker12/nCoV_tracker/blob/master/input_data/coronavirus_today.csv"))``` — Ben10, May 14 '20 at 18:52
Probably wrong URL. Navigate to the file, select `raw` in the web ui, use that URL. — Dirk Eddelbuettel, May 14 '20 at 19:07
Can someone help with this data on Kaggle: https://www.kaggle.com/c/house-prices-advanced-regression-techniques/data I tried with this code: data<-read.csv("https://www.kaggle.com/c/house-prices-advanced-regression-techniques/data?select=test.csv", skip = 1) — kosk, Jan 04 '22 at 08:29

mpalanco · Answer 2 · 2017-12-29T20:03:56.677

base

read.csv without the url function just works fine. Probably I am missing something if Dirk Eddelbuettel included it in his answer:

ad <- read.csv("http://www-bcf.usc.edu/~gareth/ISL/Advertising.csv")
head(ad)

  X    TV radio newspaper sales
1 1 230.1  37.8      69.2  22.1
2 2  44.5  39.3      45.1  10.4
3 3  17.2  45.9      69.3   9.3
4 4 151.5  41.3      58.5  18.5
5 5 180.8  10.8      58.4  12.9
6 6   8.7  48.9      75.0   7.2

Another options using two popular packages:

data.table

library(data.table)
ad <- fread("http://www-bcf.usc.edu/~gareth/ISL/Advertising.csv")
head(ad)

V1    TV radio newspaper sales
1:  1 230.1  37.8      69.2  22.1
2:  2  44.5  39.3      45.1  10.4
3:  3  17.2  45.9      69.3   9.3
4:  4 151.5  41.3      58.5  18.5
5:  5 180.8  10.8      58.4  12.9
6:  6   8.7  48.9      75.0   7.2

readr

library(readr)
ad <- read_csv("http://www-bcf.usc.edu/~gareth/ISL/Advertising.csv")
head(ad)

# A tibble: 6 x 5
     X1    TV radio newspaper sales
  <int> <dbl> <dbl>     <dbl> <dbl>
1     1 230.1  37.8      69.2  22.1
2     2  44.5  39.3      45.1  10.4
3     3  17.2  45.9      69.3   9.3
4     4 151.5  41.3      58.5  18.5
5     5 180.8  10.8      58.4  12.9
6     6   8.7  48.9      75.0   7.2

Your answer came _many_ years after mine and, indeed, the code was changed to support the more direct method. But it was not available when I wrote my answer. — Dirk Eddelbuettel, Sep 26 '20 at 15:28

score 9 · Answer 3 · answered Jun 10 '11 at 02:04

Often data on webpages is in the form of an XML table. You can read an XML table into R using the package XML.

In this package, the function

readHTMLTable(<url>)

will look through a page for XML tables and return a list of data frames (one for each table found).

score 8 · Answer 4 · answered Aug 24 '14 at 13:36

8

Beside of read.csv(url("...")) you also can use read.table("http://...").

Example:

> sample <- read.table("http://www.ats.ucla.edu/stat/examples/ara/angell.txt")
> sample
                V1   V2   V3   V4 V5
1        Rochester 19.0 20.6 15.0  E
2         Syracuse 17.0 15.6 20.2  E
...
43         Atlanta  4.2 70.6 32.6  S
>

answered Aug 24 '14 at 13:36

larkee

540
8
16

1

This is a great answer, used extensively in [r-graph-gallery](https://www.r-graph-gallery.com/connected_scatterplot_ggplot2.html) to read in `csv` data from github – RK1 Apr 04 '20 at 18:14

score 1 · Answer 5 · answered Jun 09 '11 at 21:18

1

scan can read from a web page automatically; you don't necessarily have to mess with connections.

answered Jun 09 '11 at 21:18

Aaron left Stack Overflow

36,704
7
77
142

Methinks you need the connection to access the (remote, after all) web page. Connection are a **wonderful** abstraction that allow you to use a file, a URL, a pipe to stdout from a command etc pp in a consistent way. – Dirk Eddelbuettel Jun 09 '11 at 21:44
Not that I doubt the useful of connections, but the help file for scan says that "file can also be a complete URL." I've done it that way without a formal connection and it does work. – Aaron left Stack Overflow Jun 10 '11 at 13:04
But you usually do not want `scan` but rather `read.table()` or `read.csv()` which give you higher level access. – Dirk Eddelbuettel Jun 10 '11 at 13:06
True. I realize now I read the title and not the question, where it says a CSV file. If the format of the file is nonstandard, that's when you (might) want `scan`. (That's the situation I used it for.) – Aaron left Stack Overflow Jun 10 '11 at 13:09

Access a URL and read Data with R

5 Answers5

base

data.table

readr

Linked

Related