0

I used the package 'GDELTtools' to download data from GDELT. Now, the data was downloaded however, no variable was stored in the global environment. I want to store the data into a dataframe variable so I can analyze it.

The folder contains over 30 zipped files. Every zipped file contains one csv. I need to store all these csvs in one variable in the Global Environment of r. I hope this can be done.

Thank you in advance!

Stan
  • 21
  • 3

2 Answers2

0

Haven't written R for a while so I will try my best.

Read the comments carefully, cause they will explain the procedure.

I will attach the links to check information for: unzip, readCSV, mergeDataFrames, emptyDataFrame, concatinateStrings

According to docs of GDELTtools you can easily specify folder of download by providing local.folder="~/gdeltdata" as parameter to GetGDELT() function.

After that you can list.files("path/to/files/directory") function to obtain a vector of file names used in the explanation code bellow. Check the docs for more examples and explanation.

# set path to of unzip output
outDir <-"C:\\Users\\Name\\Documents\\unzipfolder"
# relative path where zip files are stored
relativePath <- "C:\\path\\to\\my\\directory\\"
# create varible to store all the paths to the zip files in a vector
zipPaths <- vector()
# since we have 30 files we should iterate through
# I assume you have a vector with file names in the variable fileNames
for (name in fileNamesZip) {
  # Not sure if it will work but use paste() to concat strings
  zipfilepath <- paste0(relativePath, name, ".zip")
  # append filepath
  append(zipPaths, zipfilepath)
}
# now we have a vector which contains all the paths to zip files
# use unzip() function and pass zipPaths to it. (Read official docs)
unzip(files=zipPaths, exdir=outDir)
# initialize dataframe for all the data. You must provide datatypes for the columns.
total <- data.frame=(Doubles=double(),
             Ints=integer(),
             Factors=factor(),
             Logicals=logical(),
             Characters=character(),
             stringsAsFactors=FALSE)
# now its time to store data by reading csv files and storing them into dataframe.
# again, I assume you have a vector with file names in the variable fileNames
for (name in fileNamesCSV) {
  # create the csv file path 
  csvfilepath <- paste0(outDir, name, ".csv")
  # read data from csv file and store in in a dataframe
  dataFrame = read.csv(file=csvfilepath, header=TRUE, sep=",")
  # you will be able to merge dataframes only if they are equal in structure. Specify the column names to merge by.
  total <- merge(data total, data dataFrame, by=c("Name1","Name2"))
}
Vitiok
  • 84
  • 7
  • Thanks a lot for the effort. I do not have a vector with the file names. I do not know how I can efficiently gather the names of over 30 files. – Stan Feb 02 '18 at 13:15
  • Well, but it seems to be pretty easy to make it work. I'm gonna modify the answer to do that. – Vitiok Feb 03 '18 at 13:59
  • Now I hope it will solve your problem. Good luck with R. It is amazingly beautiful and suitable for data science programming language. – Vitiok Feb 03 '18 at 14:08
0

Something potentially much simpler:

  1. list.files() lists the files in a directory
  2. readr::read_csv() will automatically unzip files as necessary
  3. dplyr::bind_rows() will combine data frames

So try:

lf <- list.files(pattern="\\.zip")
dfs <- lapply(lf,readr::read_csv)
result <- dplyr::bind_rows(dfs)
Ben Bolker
  • 211,554
  • 25
  • 370
  • 453