2

I am trying to create an automatic pull in R using the GET function from the HTTR package for a csv file located on github.

Here is the table I am trying to download.

https://github.com/CSSEGISandData/COVID-19/blob/master/csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-Confirmed.csv

I can make the connection to the file using the following GET request:

library(httr)

x <- httr::GET("https://github.com/CSSEGISandData/COVID-19/blob/master/csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-Confirmed.csv")

However I am unsure how I then convert that into a dataframe similar to the table on github.

Any assistance would be much appreciated.

Cerbrus
  • 70,800
  • 18
  • 132
  • 147
SteveM
  • 213
  • 3
  • 13

2 Answers2

7

I am new to R but here is my solution.

You need to use the raw version of the csv file from github (raw.githubusercontent.com)!

library(httr)

x <- httr::GET("https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-Confirmed.csv")

# Save to file
bin <- content(x, "raw")
writeBin(bin, "data.csv")

# Read as csv
dat = read.csv("data.csv", header = TRUE, dec = ",")

colnames(dat) = gsub("X", "", colnames(dat))

# Group by country name (to sum regions)
# Skip the four first columns containing metadata 
countries = aggregate(dat[, 5:ncol(dat)], by=list(Country.Region=dat$Country.Region), FUN=sum)

# Here is the table of the most recent total confirmed cases
countries_total = countries[, c(1, ncol(countries))]

The output graph

How I got this to work:

TheMultiplexer
  • 175
  • 2
  • 2
  • 14
2

This is as simple as:

res <- httr::GET("https://.../file.csv")
data <- httr::content(res, "parsed")

This requires the readr package.

See https://httr.r-lib.org/reference/content.html

charlax
  • 25,125
  • 19
  • 60
  • 71