3

Here's my list of data frames:

[[1]]
ID   Value
A   1
B   1
C   1

[[2]]
ID   Value
A   1
D   1
E   1

[[3]]
ID   Value
B   1
C   1

I'm after a single data frame with unique (non-redundant) IDs in the left hand column, replicates in columns, and NULL values as 0:

ID   [1]Value   [2]Value   [3]Value  
A    1          1          0
B    1          0          1
C    1          0          1
D    0          1          0
E    0          1          0

I've tried:

Reduce(function(x, y) merge(x, y, by=ID), datahere)

This provides a single list but without regards to where the original values come from, and duplicate IDs are repeated in new rows.

rbindlist(datahere, use.names=TRUE, fill=TRUE, idcol="Replicate")

This provides a single list with the [x]Value number as a new column called Replicate, but still it isn't in the structure I want as the ID column has redundancies.

PeptideWitch
  • 2,239
  • 14
  • 30
  • Possible duplicate of https://stackoverflow.com/questions/14096814/merging-a-lot-of-data-frames OR https://stackoverflow.com/questions/8091303/simultaneously-merge-multiple-data-frames-in-a-list – Ronak Shah Feb 09 '18 at 05:54

3 Answers3

8

What about something like this using dplyr/purrr:

require(tidyverse);
reduce(lst, full_join, by = "ID");
#   ID Value.x Value.y Value
# 1  A       1       1    NA
# 2  B       1      NA     1
# 3  C       1      NA     1
# 4  D      NA       1    NA
# 5  E      NA       1    NA

Or with the NAs replaced with 0s:

reduce(lst, full_join, by = "ID") %>% replace(., is.na(.), 0);
#  ID Value.x Value.y Value
#1  A       1       1     0
#2  B       1       0     1
#3  C       1       0     1
#4  D       0       1     0
#5  E       0       1     0

Sample data

options(stringsAsFactors = FALSE);
lst <- list(
    data.frame(ID = c("A", "B", "C"), Value = c(1, 1, 1)),
    data.frame(ID = c("A", "D", "E"), Value = c(1, 1, 1)),
    data.frame(ID = c("B", "C"), Value = c(1, 1)))
Maurits Evers
  • 49,617
  • 4
  • 47
  • 68
1

You already have a nice answer but the typical way to do this is with tidyr::spread

Your data

A <- data.frame(ID=LETTERS[1:3], Value=1, stringsAsFactors=FALSE)
B <- data.frame(ID=LETTERS[c(1,4,5)], Value=1, stringsAsFactors=FALSE)
C <- data.frame(ID=LETTERS[c(2:3)], Value=1, stringsAsFactors=FALSE)
L <- list(A, B, C)

Solution

dplyr::bind_rows(L, .id="G") %>%
  tidyr::spread(G, Value, fill=0)

  # ID 1 2 3
# 1  A 1 1 0
# 2  B 1 0 1
# 3  C 1 0 1
# 4  D 0 1 0
# 5  E 0 1 0
CPak
  • 13,260
  • 3
  • 30
  • 48
  • Thanks for your answer CPak - this solution works really neatly, and has the added advantage of re-naming the column headers. I wish I could tick both answers right... – PeptideWitch Feb 09 '18 at 05:48
1

With base R, we need to use all = TRUE in the merge

res <- Reduce(function(...) merge(..., all = TRUE, by="ID"), lst)
replace(res, is.na(res), 0)
#    ID Value.x Value.y Value
#1  A       1       1     0
#2  B       1       0     1
#3  C       1       0     1
#4  D       0       1     0
#5  E       0       1     0

data

lst <- list(structure(list(ID = c("A", "B", "C"), Value = c(1, 1, 1)), .Names = c("ID", 
"Value"), row.names = c(NA, -3L), class = "data.frame"), structure(list(
ID = c("A", "D", "E"), Value = c(1, 1, 1)), .Names = c("ID", 
 "Value"), row.names = c(NA, -3L), class = "data.frame"), structure(list(
ID = c("B", "C"), Value = c(1, 1)), .Names = c("ID", "Value"
 ), row.names = c(NA, -2L), class = "data.frame"))
akrun
  • 874,273
  • 37
  • 540
  • 662