I am trying to strip two data frames of their data.frame structure, extract the elements in each data.frame and combine the extracted data from the data frames into a single data.frame. This should result in a data.frame consisting of two columns as vectors. See output (marked in bold) below.
Problem: The output contains multiple data.frame elements instead of a single data.frame containing the vectors from the input data frames.
Each data frame holds one vector.
[EDIT^v in response to comments.]
So far I have tried various combinations of as()
and unlist()
to no avail...
I am trying to solve this problem using built-in R functions and vectorization (not using plyr
and loops
: Merge several data.frames into one data.frame with a loop, Merge many data frames from csv files, Recombining a list of Data.frames into a single data frame)
Reproducible Code: I was unable to replicate the error, but here is how I wished my code would work:
df1<-data.frame<-c(1, 2, 3)
df2<-data.frame<-c(2, 4, 6)
output<-cbind(df1, df2)
print(output) #Returns a data.frame
str(output) # of vectors
#In my case however, a data.frame returns data.frames)
This returns:
df1 df2
[1,] 1 2
[2,] 2 4
[3,] 3 6
Reality:
readmultiple <- function(directory = "bigdata") {
....
....
....
output <- cbind.data.frame(filename, readmultiplesum)
# This is probably where things go wrong
return(output)
}
output <- lapply(filenames, complete.cases.sum)
assign("Global.output", output, envir = .GlobalEnv)
# There is probably a better way to do this too
if (firstoutput == 1) {
Global.output <- merge(as(unlist(Global.output[1]), "vector"),
as(unlist(output[1])), "vector")
# as, unlist... Not sure what's needed here
} else {
firstoutput <- 1
}
str(output)
return(Global.output)
}
The output looks like
[[1]]
filename result
1 142
[[2]]
filename result
1 521
[[3]]
filename result
1 324
But I wish for it to be
filename result
[1,] filename[i] 142
[2,] filename[i] 521
[3,] filename[i] 324
...where filename[i] is the index of filenames.
str(output) returns
List of 2400
$ :'data.frame': 1 obs. of 2 variables:
..$ filename : Factor w/ 1 level "bigdata/001.csv": 1
..$ sumrows: num 142
$ :'data.frame': 1 obs. of 2 variables:
..$ filename : Factor w/ 1 level "bigdata/001.csv": 1
..$ sumrows: num 521
$ :'data.frame': 1 obs. of 2 variables:
..$ filename : Factor w/ 1 level "bigdata/001.csv": 1
..$ sumrows: num 324
$ :'data.frame': 1 obs. of 2 variables:
..$ filename : Factor w/ 1 level "bigdata/001.csv": 1
.....
dput(head(output)) returns
list(structure(list(filename = structure(1L, .Label = "bigdata/001.csv", class = "factor"),
sumrows = 142), .Names = c("filename", "sumrows"), row.names = c(NA,
-1L), class = "data.frame"), structure(list(filename = structure(1L, .Label = "bigdata/001.csv", class = "factor"),
sumrows = 521), .Names = c("filename", "sumrows"
), row.names = c(NA, -1L), class = "data.frame"), structure(list(
filename = structure(1L, .Label = "bigdata/001.csv", class = "factor"),
sumrows = 324), .Names = c("filename", "sumrows"), row.names = c(NA,
-1L), class = "data.frame"), structure(list(filename = structure(1L, .Label = "bigdata/001.csv", class = "factor"),
sumrows = 1896), .Names = c("filename", "sumrows"
), row.names = c(NA, -1L), class = "data.frame"), structure(list(
filename = structure(1L, .Label = "bigdata/001.csv", class = "factor"),
sumrows = 1608), .Names = c("filename", "sumrows"
), row.names = c(NA, -1L), class = "data.frame"), structure(list(
filename = structure(1L, .Label = "bigdata/001.csv", class = "factor"),
sumrows = 912), .Names = c("filename", "sumrows"), row.names = c(NA,
-1L), class = "data.frame"))