3

I am probably missing something quite easy with this problem. I can't find the proper answer anywhere - and I really need to move on. So I have oversimplified my data:

eventID <- c(2,4)
Time <- c("09:32","09:43")
df1 <- data.frame(eventID,Time)
eventID <- rep(c(1:4),rep(3,4))
Time <- rep(c("09:30",NA,"09:40",NA),rep(3,4))
df2 <- data.frame(eventID,Time)

What I would like is to combine the Time column. So the NAs in df2 should be filled up with Time values from df1 matching the eventID. My original data is quite large so a for-loop is not what I am looking for really. I was hoping this would work:

> (res1 <- merge(df1,df2, by = "eventID", all = T))
#   eventID Time.x Time.y
#1        1   <NA>  09:30
#2        1   <NA>  09:30
#3        1   <NA>  09:30
#4        2  09:32   <NA>
#5        2  09:32   <NA>
#6        2  09:32   <NA>
#7        3   <NA>  09:40
#8        3   <NA>  09:40
#9        3   <NA>  09:40
#10       4  09:43   <NA>
#11       4  09:43   <NA>
#12       4  09:43   <NA>

Desired output:

> eventID <- rep(c(1:4),rep(3,4))
> Time <- rep(c("09:30","09:32","09:40","09:43"), rep(3,4))
> (res2 <- data.frame(eventID,Time))
#   eventID  Time
#1        1 09:30
#2        1 09:30
#3        1 09:30
#4        2 09:32
#5        2 09:32
#6        2 09:32
#7        3 09:40
#8        3 09:40
#9        3 09:40
#10       4 09:43
#11       4 09:43
#12       4 09:43

Suggestions? If possible in base I would prefer that, or data.table package.

tstev
  • 607
  • 1
  • 10
  • 20

1 Answers1

4

Try

library(data.table)
setkey(setDT(df2), eventID)[df1, Time:= i.Time][]
#   eventID  Time
#1:       1 09:30
#2:       1 09:30
#3:       1 09:30
#4:       2 09:32
#5:       2 09:32
#6:       2 09:32
#7:       3 09:40
#8:       3 09:40
#9:       3 09:40
#10:      4 09:43
#11:      4 09:43
#12:      4 09:43

Or a base R option after the merge would be

 data.frame(eventID= res1[,1], 
     Time=res1[-1][cbind(1:nrow(res1),max.col(!is.na(res1[-1])))])

Update

Suppose there are some 'eventID' in the first dataset are not in the second ('df2'), then we can specify nomatch=0

 df3 <- data.frame(eventID= c(2,4,5), Time=c('09:32', '09:43', '09:45'))

 setkey(setDT(df2), eventID)[df3, Time:= i.Time,nomatch=0][]
  #    eventID  Time
  # 1:       1 09:30
  # 2:       1 09:30
  # 3:       1 09:30
  # 4:       2 09:32
  # 5:       2 09:32
  # 6:       2 09:32
  # 7:       3 09:40
  # 8:       3 09:40
  # 9:       3 09:40
  #10:       4 09:43
  #11:       4 09:43
  #12:       4 09:43
akrun
  • 874,273
  • 37
  • 540
  • 662
  • Wow, that was easy! I knew it would be. I am trying to understand. Because i got as far as `df2[df1]` myself but it missed the `eventID` 1 and 3. How does `Time := i.Time` work then? I thought that was for copying. – tstev Apr 10 '15 at 15:47
  • 1
    @tstev If you do `setkey(setDT(df2), eventID)[df1]`, then the `Time` values for the corresponding `i.Time` are `NA` which we are replacing with `i.Time`. May be this link also give some ideas http://stackoverflow.com/questions/19553005/data-table-join-then-add-columns-to-existing-data-frame-without-re-copy – akrun Apr 10 '15 at 15:54
  • Thanks a lot for this answering this easy question! – tstev Apr 10 '15 at 16:26