1

I have data.frame in the list, and I intend to merge specific data.frame objects conditionally where merge second, third data.frame objects without duplication, then merge it with first data.frame objects. However, I used rbind function to do this task, but my approach is not elegant. Can anyone help me out the improve the solution ? How can I achieve more compatible solution that can be used in dynamic functional programming ? How can I get desired output ? Any idea ?

reproducible example:

dfList <- list(
  DF.1 = data.frame(red=c(1,2,3), blue=c(NA,1,2), green=c(1,1,2)),
  DF.2 = data.frame(red=c(2,3,NA), blue=c(1,2,3), green=c(1,2,4)),
  DF.3 = data.frame(red=c(2,3,NA,NA), blue=c(1,2,NA,3), green=c(1,2,3,4))
)

dummy way to do it:

rbind(dfList[[1L]], unique(rbind(dfList[[2L]], dfList[[3L]])))

Apparently, my attempt is not elegant to apply in functional programming. How can make this happen elegantly ?

desired output :

red blue green
1    1   NA     1
2    2    1     1
3    3    2     2
11   2    1     1
21   3    2     2
31  NA    3     4
6   NA   NA     3

How can I improve my solution more elegantly and efficiently ? Thanks in advance

Andy.Jian
  • 417
  • 3
  • 15

2 Answers2

3

The best (easiest and fastest way) to do this is data.table::rbindlist.

It would work like this:

library(data.table)
dfList <- list(
  DF.1 = data.table(red=c(1,2,3), blue=c(NA,1,2), green=c(1,1,2)),
  DF.2 = data.table(red=c(2,3,NA), blue=c(1,2,3), green=c(1,2,4)),
  DF.3 = data.table(red=c(2,3,NA,NA), blue=c(1,2,NA,3), green=c(1,2,3,4))
)

# part 1: list element 1
dt_1 <- dfList[[1]]

# part 2: all other list elements (in your case 2 and 3)
dt_2 <- unique(rbindlist(dfList[-1]))

# use rbindlist to bind the rows together
dt_all <- rbindlist(list(dt_1, dt_2))

Comment.

My solution is pretty close to your proposed solution. I think the "ugliness" about this way is that it is an edge case to merge datasets and deattach the first element (and treat it in a different way). The best solution would probably be to step back and think about the underlying idea and solve it using an additional variable in the datasets (i.e., for df1 and then for df2_3), which I would consider the R-way.

Something along this thought would look like this:

myList2 <- list(
  DF.1 = data.table(red=c(1,2,3), blue=c(NA,1,2), green=c(1,1,2), var = "df1"),
  DF.2 = data.table(red=c(2,3,NA), blue=c(1,2,3), green=c(1,2,4), var = "other"),
  DF.3 = data.table(red=c(2,3,NA,NA), blue=c(1,2,NA,3), green=c(1,2,3,4), var = "other")
)

dt <- rbindlist(myList2)
unique(dt)
# red blue green   var
# 1:   1   NA     1   df1
# 2:   2    1     1   df1
# 3:   3    2     2   df1
# 4:   2    1     1 other
# 5:   3    2     2 other
# 6:  NA    3     4 other
# 7:  NA   NA     3 other
David
  • 9,216
  • 4
  • 45
  • 78
  • Thanks for the comment! I edited the post, does that help you more? – David Nov 10 '16 at 12:27
  • still not convinced myself casting myList to data.table, then proceed data.table solution. why can't use original list to do it? Apparently my solution can do, but it is not pretty. – Andy.Jian Nov 10 '16 at 12:30
1

A way of rbinding a list of data.frames with only base R is do.call(list, rbind) (see this question that also presents some alternatives).

If you then desire only unique rows you can follow-up with a unique

 unique(do.call(dfList, rbind))
Community
  • 1
  • 1
vincentmajor
  • 1,076
  • 12
  • 20
  • here is condition that first I need to merge second, third data.frame together without duplication, after that I am gonna merge this result with first data.frame where duplicate row is not removed. – Andy.Jian Nov 10 '16 at 15:54