0

very similar to this questions I try to populate a new variable by finding the last non missing value by group for an existing variable in a dataframe, ideally using dplyr/zoo. I want to only keep the last value though, and not merely overwrite missings, consider the following minimal example:

df1 <- data.frame(ID   = c(1, 1, 1, 2, 2,2),
                  date = c(1,2,3,1,2,3),
                  var1 = c('a', '', 'b', '','c', ''))

df2 = ## R-commands to get:
df2 <- data.frame(ID   = c(1, 1, 1, 2, 2,2),
                  date = c(1,2,3,1,2,3),
                  var1 = c('b', 'b', 'b', 'c','c', 'c'))
Sotos
  • 51,121
  • 6
  • 32
  • 66
safex
  • 2,398
  • 17
  • 40

2 Answers2

3

Using dplyr,

library(dplyr)

df1 %>% 
 group_by(ID) %>% 
 mutate(var1 = last(var1[var1 != '']))

which gives,

# A tibble: 6 x 3
# Groups:   ID [2]
     ID  date var1 
  <dbl> <dbl> <fct>
1     1     1 b    
2     1     2 b    
3     1     3 b    
4     2     1 c    
5     2     2 c    
6     2     3 c
Sotos
  • 51,121
  • 6
  • 32
  • 66
1

Here is one option with base R using ave

df1$var1 <- with(df1, ave(as.character(var1), ID, FUN = 
      function(x) tail(x[nzchar(x)], 1)))
df1$var1
#[1] "b" "b" "b" "c" "c" "c"
akrun
  • 874,273
  • 37
  • 540
  • 662