0

I would like to import/read the Coaid dataset from Github into RStudio. The dataset is a zip file containing 4 files which each file containing multiple csv files. The following is the link to the Coaid dataset from GitHub: https://github.com/cuilimeng/CoAID.git

I tried the following, ie downloading each file from the one of the folders: eg: ClaimFakeCOVID_19<-read_csv("C:/Clive/Documents/R/CoAID-master/05-01-2020/ClaimFakeCOVID-19.csv") ClaimFakeCOVID_19_tweets<-read_csv("C:/Clive/Documents/R/CoAID-master/05-01-2020/ClaimFakeCOVID-19_tweets.csv"). This is just two of the 12 files from the first folder and there are four folders. I could download the files each file at a time. I feel there is a way, simple R code that I might use the download the zip file and read it into R studio for furter analyis. My current approach appears to work, however, it is cumbersome. Is there anyone who can inform me as to how might do this. A very simple step by step process will help me,

Thanks.

1 Answers1

0

Welcome to SO. I think there are multiple questions that almost do what you are looking for. I go for scraping a list of csv-urls from a github folder and loading multiple csv from a vector to different dataframes. You can modify this.

# Step 1: Scrape list of csv from github folder
# https://stackoverflow.com/questions/64401417/is-there-an-r-function-to-read-multiple-csvs-at-once-from-a-github-repo

library(dplyr)
library(rvest)

url <- "https://github.com/cuilimeng/CoAID/tree/master/05-01-2020"

csv_list <- url %>%
  read_html() %>%
  html_nodes(xpath = '//*[@role="rowheader"]') %>%
  html_nodes('span a') %>%
  html_attr('href') %>%
  #head %>% # <- remove this line to read all the files. 
  sub('blob/', '', .) %>%
  paste0('https://raw.githubusercontent.com', .) #%>%
  #purrr::map_df(read.csv) ->  combined_data

# Step 2: Read list of csv to computer
# https://stackoverflow.com/questions/11433432/how-to-import-multiple-csv-files-at-once

for (i in 1:length(csv_list)) assign(csv_list[i], read.csv(csv_list[i]))

Thus e.g. https://github.com/cuilimeng/CoAID/blob/master/05-01-2020/ClaimRealCOVID-19.csv

enter image description here

turns into

enter image description here

Marco
  • 2,368
  • 6
  • 22
  • 48