R - Merge list of three dataframes into single dataframe with ID in first column, next three columns show values

Question

Here's my list of data frames:

[[1]]
ID   Value
A   1
B   1
C   1

[[2]]
ID   Value
A   1
D   1
E   1

[[3]]
ID   Value
B   1
C   1

I'm after a single data frame with unique (non-redundant) IDs in the left hand column, replicates in columns, and NULL values as 0:

ID   [1]Value   [2]Value   [3]Value  
A    1          1          0
B    1          0          1
C    1          0          1
D    0          1          0
E    0          1          0

I've tried:

Reduce(function(x, y) merge(x, y, by=ID), datahere)

This provides a single list but without regards to where the original values come from, and duplicate IDs are repeated in new rows.

rbindlist(datahere, use.names=TRUE, fill=TRUE, idcol="Replicate")

This provides a single list with the [x]Value number as a new column called Replicate, but still it isn't in the structure I want as the ID column has redundancies.

Possible duplicate of https://stackoverflow.com/questions/14096814/merging-a-lot-of-data-frames OR https://stackoverflow.com/questions/8091303/simultaneously-merge-multiple-data-frames-in-a-list — Ronak Shah, Feb 09 '18 at 05:54

Maurits Evers · Accepted Answer · 2018-02-09T05:34:42.663

8

What about something like this using dplyr/purrr:

require(tidyverse);
reduce(lst, full_join, by = "ID");
#   ID Value.x Value.y Value
# 1  A       1       1    NA
# 2  B       1      NA     1
# 3  C       1      NA     1
# 4  D      NA       1    NA
# 5  E      NA       1    NA

Or with the NAs replaced with 0s:

reduce(lst, full_join, by = "ID") %>% replace(., is.na(.), 0);
#  ID Value.x Value.y Value
#1  A       1       1     0
#2  B       1       0     1
#3  C       1       0     1
#4  D       0       1     0
#5  E       0       1     0

Sample data

options(stringsAsFactors = FALSE);
lst <- list(
    data.frame(ID = c("A", "B", "C"), Value = c(1, 1, 1)),
    data.frame(ID = c("A", "D", "E"), Value = c(1, 1, 1)),
    data.frame(ID = c("B", "C"), Value = c(1, 1)))

edited Feb 09 '18 at 05:34

answered Feb 09 '18 at 05:21

Maurits Evers

49,617
4
47
68

Worked perfectly, thank you. Didn't see the purrr package before. Just one thing extra before I close up. Any way to set the default NA value to, say, 0? Didn't see it on the R documentation page – PeptideWitch Feb 09 '18 at 05:28
No worries @PeptideWitch; I've updated my answer to replace the `NA`s with `0`s. – Maurits Evers Feb 09 '18 at 05:30
You can use `replace(., is.na(.), 0)` instead of `mutate_all(...` – CPak Feb 09 '18 at 05:33
Very nice. Thanks again for your help! – PeptideWitch Feb 09 '18 at 05:33
Cheers @CPak; fixed it. – Maurits Evers Feb 09 '18 at 05:35
Yep, can confirm the suggestion by @CPak - all NAs replaced by 0s. – PeptideWitch Feb 09 '18 at 05:35
`unable to find an inherited method for function ‘reduce’ for signature ‘"list"’` Could someone help with this? – Brofessor Dec 22 '21 at 05:17
@Brofessor Make sure you use `purrr`'s `reduce`. If in doubt about namespacing, use `purrr::reduce(lst, full_join, by = "ID")`. – Maurits Evers Dec 22 '21 at 11:42

score 1 · Answer 2 · answered Feb 09 '18 at 05:40

1

You already have a nice answer but the typical way to do this is with tidyr::spread

Your data

A <- data.frame(ID=LETTERS[1:3], Value=1, stringsAsFactors=FALSE)
B <- data.frame(ID=LETTERS[c(1,4,5)], Value=1, stringsAsFactors=FALSE)
C <- data.frame(ID=LETTERS[c(2:3)], Value=1, stringsAsFactors=FALSE)
L <- list(A, B, C)

Solution

dplyr::bind_rows(L, .id="G") %>%
  tidyr::spread(G, Value, fill=0)

  # ID 1 2 3
# 1  A 1 1 0
# 2  B 1 0 1
# 3  C 1 0 1
# 4  D 0 1 0
# 5  E 0 1 0

answered Feb 09 '18 at 05:40

CPak

13,260
3
30
48

Thanks for your answer CPak - this solution works really neatly, and has the added advantage of re-naming the column headers. I wish I could tick both answers right... – PeptideWitch Feb 09 '18 at 05:48

akrun · Answer 3 · 2018-02-09T05:53:51.673

1

With base R, we need to use all = TRUE in the merge

res <- Reduce(function(...) merge(..., all = TRUE, by="ID"), lst)
replace(res, is.na(res), 0)
#    ID Value.x Value.y Value
#1  A       1       1     0
#2  B       1       0     1
#3  C       1       0     1
#4  D       0       1     0
#5  E       0       1     0

data

lst <- list(structure(list(ID = c("A", "B", "C"), Value = c(1, 1, 1)), .Names = c("ID", 
"Value"), row.names = c(NA, -3L), class = "data.frame"), structure(list(
ID = c("A", "D", "E"), Value = c(1, 1, 1)), .Names = c("ID", 
 "Value"), row.names = c(NA, -3L), class = "data.frame"), structure(list(
ID = c("B", "C"), Value = c(1, 1)), .Names = c("ID", "Value"
 ), row.names = c(NA, -2L), class = "data.frame"))

edited Feb 09 '18 at 05:53

answered Feb 09 '18 at 05:41

akrun

874,273
37
540
662

Thanks for answering. However, this throws up the error: Error in merge.data.table: x has some duplicated column name(s): etc etc etc. Please remove or rename the duplicate(s) and try again. – PeptideWitch Feb 09 '18 at 05:52
@PeptideWitch I added the dataset used. It is working as expected. Do you have a `data.table` object? I am using `data.frame` – akrun Feb 09 '18 at 05:54
@PeptideWitch I also checked with data.table object. It is working fine for me – akrun Feb 09 '18 at 05:55
Yep, good spotting. I'm actually using a data.table; didn't realise 'fread' was reading my .csv files to a data.table instead of a data.frame. Cheers – PeptideWitch Feb 09 '18 at 05:58
Hmm, wonder why my data.table doesn't like it. – PeptideWitch Feb 09 '18 at 05:58
@PeptideWitch Please check the version of your data.table. – akrun Feb 09 '18 at 06:00
data.table_1.10.4-3 – PeptideWitch Feb 09 '18 at 06:01
@PeptideWitch I am using `data.table_1.10.5` – akrun Feb 09 '18 at 06:02

R - Merge list of three dataframes into single dataframe with ID in first column, next three columns show values

3 Answers3

Sample data

data