I have some CSV files coming from an s3 bucket that I would like to merge into one dataframe. For the purposes of this exercise, let's use these five files stored in a character vector called script_results
:
[1] "05162021-test-america-distributed-a-script_results-03-16-2021-404806813.csv"
[2] "07262021-test-america-distributed-a-script_results-04-26-2021-534691276.csv"
[3] "09202021-test-arizona-script_results-08-19-2020--08-31-2020-354731384.csv"
[4] "09222021-test-wisconsin-script_results-08-19-2020--08-20-2020-068472211.csv"
[5] "01062021-av-america-script_results-01-5-2021-038680844.csv"
The first step is extracting the data from each CSV, using the following (with the first .csv listed above as the example):
extract_data <- aws.s3::s3read_using(read.csv, bucket = "test-america", object = "03162021-test-america-distributed-a-script_results-03-16-2021-404806813.csv")
It would then continuously bind together each of these unpacked CSV's into one dataframe using the rbind.fill
function from the plyr
package.
So the end result of this would be all five of the data sets coming from the five CSVs listed under script_results
merged together using the rbind.fill
function.
I should also note that I will be receiving a new file on a daily basis, so I want to continuously adjust for those new files reaching my S3 bucket.
Is this possible to do?
EDIT: Using @r2evans comments
L <- lapply(script_results, function(fn) aws.s3::s3read_using(read.csv, bucket="bucket-name", object=fn))
This works. But the issue is when I try to convert it to a dataframe using:
t<- do.call(rbind.data.frame, L)
I'm given the following error:
Error in (function (..., deparse.level = 1, make.row.names = TRUE, stringsAsFactors = FALSE, :
numbers of columns of arguments do not match