Merger of two data frames results in multiple data frames in R

Question

I am trying to strip two data frames of their data.frame structure, extract the elements in each data.frame and combine the extracted data from the data frames into a single data.frame. This should result in a data.frame consisting of two columns as vectors. See output (marked in bold) below.

Problem: The output contains multiple data.frame elements instead of a single data.frame containing the vectors from the input data frames.

Each data frame holds one vector.

[EDIT^v in response to comments.]

So far I have tried various combinations of as() and unlist() to no avail...

I am trying to solve this problem using built-in R functions and vectorization (not using plyrand loops: Merge several data.frames into one data.frame with a loop, Merge many data frames from csv files, Recombining a list of Data.frames into a single data frame)

Reproducible Code: I was unable to replicate the error, but here is how I wished my code would work:

df1<-data.frame<-c(1, 2, 3)
df2<-data.frame<-c(2, 4, 6)

output<-cbind(df1, df2)
print(output)       #Returns a data.frame
str(output)         #                     of vectors
                    #In my case however, a data.frame returns data.frames)

This returns:

       df1 df2
[1,]   1   2
[2,]   2   4
[3,]   3   6

Reality:

readmultiple <- function(directory = "bigdata") {

    ....


    ....
    ....
        output <- cbind.data.frame(filename, readmultiplesum) 
        # This is probably where things go wrong
        return(output)
    }
    output <- lapply(filenames, complete.cases.sum)
    assign("Global.output", output, envir = .GlobalEnv) 
    # There is probably a better way to do this too

    if (firstoutput == 1) {
        Global.output <- merge(as(unlist(Global.output[1]), "vector"), 
                           as(unlist(output[1])), "vector") 
    # as, unlist... Not sure what's needed here
    } else {
        firstoutput <- 1
    }
    str(output)
    return(Global.output)
}

The output looks like

[[1]]
   filename result 
          1         142 

[[2]]
   filename result
          1        521

[[3]]
   filename result
          1         324

But I wish for it to be

filename        result 

[1,]   filename[i]  142 

[2,]   filename[i]  521

[3,]   filename[i]  324

...where filename[i] is the index of filenames.

str(output) returns

List of 2400
 $ :'data.frame':       1 obs. of  2 variables:
  ..$ filename   : Factor w/ 1 level "bigdata/001.csv": 1
  ..$ sumrows: num 142
 $ :'data.frame':       1 obs. of  2 variables:
  ..$ filename   : Factor w/ 1 level "bigdata/001.csv": 1
  ..$ sumrows: num 521
 $ :'data.frame':       1 obs. of  2 variables:
  ..$ filename   : Factor w/ 1 level "bigdata/001.csv": 1
  ..$ sumrows: num 324
 $ :'data.frame':       1 obs. of  2 variables:
  ..$ filename   : Factor w/ 1 level "bigdata/001.csv": 1

.....

dput(head(output)) returns

    list(structure(list(filename = structure(1L, .Label = "bigdata/001.csv", class = "factor"), 
    sumrows = 142), .Names = c("filename", "sumrows"), row.names = c(NA, 
-1L), class = "data.frame"), structure(list(filename = structure(1L, .Label = "bigdata/001.csv", class = "factor"), 
    sumrows = 521), .Names = c("filename", "sumrows"
), row.names = c(NA, -1L), class = "data.frame"), structure(list(
    filename = structure(1L, .Label = "bigdata/001.csv", class = "factor"), 
    sumrows = 324), .Names = c("filename", "sumrows"), row.names = c(NA, 
-1L), class = "data.frame"), structure(list(filename = structure(1L, .Label = "bigdata/001.csv", class = "factor"), 
    sumrows = 1896), .Names = c("filename", "sumrows"
), row.names = c(NA, -1L), class = "data.frame"), structure(list(
    filename = structure(1L, .Label = "bigdata/001.csv", class = "factor"), 
    sumrows = 1608), .Names = c("filename", "sumrows"
), row.names = c(NA, -1L), class = "data.frame"), structure(list(
    filename = structure(1L, .Label = "bigdata/001.csv", class = "factor"), 
    sumrows = 912), .Names = c("filename", "sumrows"), row.names = c(NA, 
-1L), class = "data.frame"))

To save people's time, can you edit your question and explain at the top what you mean by *When trying to add two data frames*. I assume you are not talking about `df1 + df2`... In fact, why don't you give us a reproducible example with two small data.frames and your expected output? A lot of your code seems irrelevant to the question. — flodel, Jan 19 '13 at 13:52
Please make your code [reproducible](http://stackoverflow.com/q/5963269/1412059). We should be able to just copy your code into an R session to execute it. The answer to your question will probably involve `?do.call`. — Roland, Jan 19 '13 at 15:15
@agstudy. Apologies, I meant filename[i] where i is an index from the vector filenames. — noumenal, Jan 19 '13 at 15:17
@Roland. I'm afraid I was only able to provide reproducible code for the expected output (see edit). — noumenal, Jan 19 '13 at 15:39
I would like to request a deletion of this question. Because of the edits it does no longer resemble a real question anymore. — noumenal, Jul 18 '13 at 08:30

agstudy · Answer 1 · 2013-01-19T20:41:37.450

1

A General technique to change a list to a data.frame is to use do.call

ll <- list(c(filename=1 ,result=142 ),c(filename=2 ,result=521 ))
> do.call(rbind,ll)
     filename result
[1,]        1    142
[2,]        2    521

When I apply this to your list I get:

do.call(rbind,ll)
         filename sumrows
1 bigdata/001.csv     142
2 bigdata/001.csv     521
3 bigdata/001.csv     324
4 bigdata/001.csv    1896
5 bigdata/001.csv    1608
6 bigdata/001.csv     912

Unfortunatlty you don't precise what'is filename[i]?

Edit

This solution seems to work for the OP:

library(plyr)
ldply(ll)

generaly you can use:

ldply(ll,function(x){
           ##you process the row x here
  }
 )

edited Jan 19 '13 at 20:41

answered Jan 19 '13 at 15:16

agstudy

119,832
17
199
261

The problem is rather that I need to coerce two data frames into vectors. Sorry for not making this clear. – noumenal Jan 19 '13 at 15:49
1

can you at least add the output of : `str(output)` and `dput(head(output))`? – agstudy Jan 19 '13 at 16:33
Done. Let me know if there is anything else. – noumenal Jan 19 '13 at 17:09
1

@DanielLabbé I will edit your question to make it simpler. – agstudy Jan 19 '13 at 17:13
Thanks a lot! That's really appreciated. Let me know if I can make it better somehow. I'm just starting out. – noumenal Jan 19 '13 at 17:15
1

@DanielLabbé can you please recise what is filename[i]? and add what i miss..? – agstudy Jan 19 '13 at 17:18
filename[i] is a char and should read "001.csv", "002.csv", "003.csv" etc. (I was unable to edit the current OP, because "Your post does not have much context to explain the code sections; please explain your scenario more clearly.") – noumenal Jan 19 '13 at 17:53
let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/23021/discussion-between-agstudy-and-daniel-labbe) – agstudy Jan 19 '13 at 17:57

Merger of two data frames results in multiple data frames in R

1 Answers1