5

If I have a dataframe with a key column and data columns, like this

df <- cbind(key=c("Jane", "Jane", "Sam", "Sam", "Mary"), var1=c("a", NA, "a", "a", "c"), var2=c(NA, "b", NA, "b", "d"))

key    var1 var2
"Jane" "a"  NA  
"Jane" NA   "b" 
"Sam"  "a"  NA
"Sam"  "a"  "b" 
"Mary" "c"  "d" 
"Mary" "c"  NA

And want a dataframe that merges the rows by name, overwriting NAs whenever possible, like so

key    var1 var2
"Jane" "a"  "b"
"Sam"  "a"  "b"
"Mary" "c"  "d"

How can I do this?

FXQuantTrader
  • 6,821
  • 3
  • 36
  • 67
sus
  • 335
  • 1
  • 3
  • 13
  • 1
    You should provide a better example to illustrate *whenever possible* since in your current example it is always possible. – flodel Jan 30 '14 at 02:17

2 Answers2

6
library(data.table)
dtt <- as.data.table(df)

dtt[, list(var1=unique(var1[!is.na(var1)])
         , var2=unique(var2[!is.na(var2)]))
    , by=key]

    key var1 var2
1: Jane    a    b
2: Mary    c    d
3:  Sam    a    b
Ricardo Saporta
  • 54,400
  • 17
  • 144
  • 178
3

Here's a solution using dplyr. Note that cbind() creates matrices, not data frames, so I've modified the code to do what I think you meant. I also pulled out the selection algorithm into a separate function. I think this is good practice because it allows you to change your algorithm in one place if you discover you need something different.

df <- data.frame(
  key = c("Jane", "Jane", "Sam", "Sam", "Mary"), 
  var1 = c("a", NA, "a", "a", "c"), 
  var2 = c(NA, "b", NA, "b", "d"),
  stringsAsFactors = FALSE
)

library(dplyr)

collapse <- function(x) x[!is.na(x)][1]

df %.% 
  group_by(key) %.%
  summarise(var1 = collapse(var1), var2 = collapse(var2))
# Source: local data frame [3 x 3]
# 
#  key var1 var2
# 1 Mary    c    d
# 2  Sam    a    b
# 3 Jane    a    b
hadley
  • 102,019
  • 32
  • 183
  • 245
  • 1
    I'd tried and failed to use plyr for this. I'll have to have a look at the dplyr docs to understand how this works, but thanks! – sus Jan 30 '14 at 17:56
  • @sus A plyr solution is also simple: `ddply(df, "key", summarise, var1 = collapse(var1), var2 = collapse(var2)` – hadley Jan 30 '14 at 19:02
  • How can I merge rows by ID and year if I have 20 columns that have to be merged? – ZayzayR Apr 21 '22 at 14:07