Combine several data.frames into one (keep every rownames)

Question

I have input dataframes Berry and Orange

Berry = structure(list(Name = c("ACT", "ACTION", "ACTIVISM", "ACTS", 
"ADDICTION", "ADVANCE"), freq = c(2L, 2L, 1L, 1L, 1L, 1L)), .Names = c("Name", 
"freq"), row.names = c(NA, 6L), class = "data.frame")

Orange = structure(list(Name = c("ACHIEVE", "ACROSS", "ACT", "ACTION", 
"ADVANCE", "ADVANCING"), freq = c(1L, 3L, 1L, 1L, 1L, 1L)), .Names = c("Name", 
"freq"), row.names = c(NA, 6L), class = "data.frame")

Running the following operation will give me the desired output

output = t(merge(Berry,Orange, by = "Name", all = TRUE))
rownames(output) = c("","Berry","Orange")
colnames(output) = output[1,]
output = output[2:3,]
output = data.frame(output)

However, now I have to create output from 72 dataframes similar to Berry and Orange. Since merge appears to work with only two data.frame at a time, I'm not sure what would be the best approach for me. I tried rbind.fill which kept the values but lost the Names. I found this and this but couldn't figure out a solution on my own.

Here is one more data.frame in order to provide a reproducible example

Apple = structure(list(Name = c("ABIDING", "ABLE", "ABROAD", "ACROSS", 
"ACT", "ADVANTAGE"), freq = c(1L, 1L, 1L, 4L, 2L, 1L)), .Names = c("Name", 
"freq"), row.names = c(NA, 6L), class = "data.frame")

I'm trying to figure out how to obtain outputfrom Apple, Berry, and Orange. I am looking for a solution that would work for multiple dataframes preferably without me having to provide the dataframes manually.

You can assume that the data.frame names to be processed for getting the output is available in a list df_names:

 df_names = c("Apple","Berry","Orange")

Or, you can also assume that every data.frame in the Global Environment needs to be processed to create output.

Can you create a list of the data frames and use lapply? – Dinesh.hmn Jan 22 '17 at 20:55 — Dinesh.hmn, Jan 22 '17 at 20:55

score 4 · Accepted Answer · answered Jan 22 '17 at 21:08

If you have all your data frames in an environment, you can get them into a named list then use package reshape2 to reshape the list. If desired, you can then set the first column as the row names.

library(reshape2)
dcast(melt(Filter(is.data.frame, mget(ls()))), L1 ~ Name)
#       L1 ABIDING ABLE ABROAD ACHIEVE ACROSS ACT ACTION ACTIVISM ACTS ADDICTION ADVANCE ADVANCING ADVANTAGE
# 1  Apple       1    1      1      NA      4   2     NA       NA   NA        NA      NA        NA         1
# 2  Berry      NA   NA     NA      NA     NA   2      2        1    1         1       1        NA        NA
# 3 Orange      NA   NA     NA       1      3   1      1       NA   NA        NA       1         1        NA

Note: This assumes all your data is in the global environment and that no other data frames are present except the ones to be used here.

akrun · Answer 2 · 2017-01-23T02:51:42.760

We can use tidyverse

library(dplyr)
library(tidyr)
list(Apple = Apple, Orange = Orange, Berry = Berry)  %>%
          bind_rows(.id = "objName") %>% 
          spread(Name, freq, fill = 0) 
#    objName ABIDING ABLE ABROAD ACHIEVE ACROSS ACT ACTION ACTIVISM ACTS ADDICTION ADVANCE ADVANCING ADVANTAGE
#1   Apple       1    1      1       0      4   2      0        0    0         0       0         0         1
#2   Berry       0    0      0       0      0   2      2        1    1         1       1         0         0
#3  Orange       0    0      0       1      3   1      1        0    0         0       1         1         0

As you have 72 data.frames, it is better not to create all these objects in the global environment. Instead, read the dataset files in a list and then do the processing. Suppose, if the files are all in the working directory

files <- list.files(pattern = ".csv")
lapply(files, read.csv, stringsAsFactors=FALSE)

and then do the processing with bind_rows as above. As it is not clear about the file names, we cannot comment on how to create the 'objName'

Combine several data.frames into one (keep every rownames)

2 Answers2