I have searched a lot and none of the similar sounding posts are the same as mine. I am struggling for last few days to get this apparently simple job done.
I have created a simple example to demonstrate the problem:
Let us assume there are 4 csv files, r.trials1-jan1
, ..., rtrials1-jan4
This is the directory.
dir(pattern = "r.trial*")
[1] "r.trials-jan1.csv" "r.trials-jan2.csv" "r.trials-jan3.csv" "r.trials-jan4.csv"
Now I store all 4 files in a vector filenames
, as follows.
filenames <- list.files(pattern = "r.trial*")
So far so good. Now comes the challenge.
wind_data <- lapply(filenames, read.csv)
combined_data <- rbind(wind_data)
The combined_data object does not appear in a single large data frame. Instead, is broken into as many lists as the number of csv files...This is not what I want.
I can get the files into one large data frame successfully if I do the read.csv one by one.
Like this one
x1 <- read.csv(filenames[1])
x2 <- read.csv(filenames[2])
and then doing an rbind
x12 <- rbind(x1, r2)
See the difference in the data structure between x1, x12 and combined_data here:
str(x1)
'data.frame': 24 obs. of 5 variables:
$ date : Factor w/ 24 levels "1-Jan-2017 0:00:00",..: 1 12 17 18 19 20 21 22 23 24 ...
$ pressure: num 2.541 4.729 7.569 0.784 1.526 ...
$ temp : num 30.8 12.3 53 45.7 18.2 ...
$ speed : num 296.9 104.68 8.18 260.2 40.23 ...
$ dia : num 920 664 806 427 824 ...`
The above is one single csv file imported into a df.
str(x12)
'data.frame': 48 obs. of 5 variables:
$ date : Factor w/ 48 levels "1-Jan-2017 0:00:00",..: 1 12 17 18 19 20 21 22 23 24 ...
$ pressure: num 2.541 4.729 7.569 0.784 1.526 ...
$ temp : num 30.8 12.3 53 45.7 18.2 ...
$ speed : num 296.9 104.68 8.18 260.2 40.23 ...
$ dia : num 920 664 806 427 824 ...`
The above is two csvs combined one by one.
But with a large number of files, the above approach becomes very tedious.
Hence the lapply()
function was used to get all csv into one file.
And here is the structure output of combined_data.
str(combined_data)
List of 4
$ :'data.frame': 24 obs. of 5 variables:
..$ date : Factor w/ 24 levels "1-Jan-2017 0:00:00",..: 1 12 17 18 19 20 21 22 23 24 ...
..$ pressure: num [1:24] 2.869 7.881 0.908 4.616 2.719 ...
..$ temp : num [1:24] 14 61.4 52.7 97.5 99 ...
..$ speed : num [1:24] 267.9 36.4 231.7 299.5 203 ...
..$ dia : num [1:24] 880 932 514 661 580 ...
$ :'data.frame': 24 obs. of 5 variables:
..$ date : Factor w/ 24 levels "2-Jan-2017 0:00:00",..: 1 12 17 18 19 20 21 22 23 24 ...
..$ pressure: num [1:24] 4.96 9.57 0.34 5.18 7.34 ...
..$ temp : num [1:24] 26.5 74.5 76.8 52.8 68.2 ...
..$ speed : num [1:24] 238.3 37 16.4 30.8 12.2 ...
..$ dia : num [1:24] 163 548 161 631 437 ...
$ :'data.frame': 24 obs. of 5 variables:
..$ date : Factor w/ 24 levels "3-Jan-2017 0:00:00",..: 1 12 17 18 19 20 21 22 23 24 ...
..$ pressure: num [1:24] 9.79 7.01 5.7 2.46 2.46 ...
..$ temp : num [1:24] 76.8 11.9 30.6 16.2 90.9 ...
..$ speed : num [1:24] 208.6 240 270.1 46.4 224.5 ...
..$ dia : num [1:24] 50.6 374.9 265.2 816 315.5 ...
$ :'data.frame': 24 obs. of 5 variables:
..$ date : Factor w/ 24 levels "4-Jan-2017 0:00:00",..: 1 12 17 18 19 20 21 22 23 24 ...
..$ pressure: num [1:24] 0.761 3.384 8.696 3.355 9.007 ...
..$ temp : num [1:24] 42.9 94 4.7 44.9 74 ...
..$ speed : num [1:24] 199.73 223.39 128.77 56.29 6.64 ...
..$ dia : num [1:24] 832 764 389 293 686 ...
- attr(*, "dim")= int [1:2] 1 4
- attr(*, "dimnames")=List of 2
..$ : chr "wind_data"
..$ : NULL`
So my questions are
What are other ways to convert many csv files into one large data frame and not the list of dataframes?
Why is the lapply read.csv not combining all csv into one?