Get the latest updated value from last non-na column

Question

I have a dataframe like this

df <- data.frame(name1 = c("a" , "a", "a", "a", "c", "c", "c", "c"),
                 name2 = c(NA,"a","a",NA, NA, "c", "c", NA),
                 name3 = c(NA, "b", "b", NA, NA, "d","d",NA))

Then, I did make a new column based on some conditions

library(tidyverse)
df %>% mutate(name4 = ifelse(!is.na(name3), name3, name1))

    name1 name2 name3 name4
1     a  <NA>  <NA>     a
2     a     a     b     b
3     a     a     b     b
4     a  <NA>  <NA>     a
5     c  <NA>  <NA>     c
6     c    c     d      d
7     c    c     d      d
8     c  <NA>  <NA>     c

I would like to replace a, c by b, d in name4, respectively without calling the character i.e a, b. (Making another column also a good option right?)

Any suggestions for this?

Desired output

    name1 name2 name3 name4
1     a  <NA>  <NA>     b
2     a     a     b     b
3     a     a     b     b
4     a  <NA>  <NA>     b
5     c  <NA>  <NA>     d
6     c    c     d      d
7     c    c     d      d
8     c  <NA>  <NA>     d

I do not follow, could you clarify, why a becomes b in name4 column? — zx8754, Sep 28 '21 at 08:11
Lets say a in name2 column will be changed to b in name3 column. Then, name4 column like a final column containing old name and new name — Anh, Sep 28 '21 at 08:15
Or the character in name3 column as my first priority to use, followed by name1 column, but now I would like to standard old name `a` to new name `b`. Is that clear? — Anh, Sep 28 '21 at 08:17
I think you are looking for coalesce, see https://stackoverflow.com/q/19253820/680068 — zx8754, Sep 28 '21 at 08:21
@zx8754 no sir, since my real dataframe is very long with different characters in name1 column that I want to keep it. Coalesce will remove those name from my data frame. — Anh, Sep 28 '21 at 08:55
Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/237582/discussion-between-anh-and-zx8754). — Anh, Sep 28 '21 at 09:02

score 1 · Answer 1 · answered Sep 28 '21 at 09:34

Here are two possible answers:

df <- data.frame(name1 = c("a" , "a", "a", "a", "c", "c", "c", "c"),
                 name2 = c(NA,"a","a",NA, NA, "c", "c", NA),
                 name3 = c(NA, "b", "b", NA, NA, "d","d",NA))
library(tidyverse)
df %>% mutate(name4 = ifelse(!is.na(name3), name3, name1), 
              name4=sub('a','b', sub('c','d',name4)))
#>   name1 name2 name3 name4
#> 1     a  <NA>  <NA>     b
#> 2     a     a     b     b
#> 3     a     a     b     b
#> 4     a  <NA>  <NA>     b
#> 5     c  <NA>  <NA>     d
#> 6     c     c     d     d
#> 7     c     c     d     d
#> 8     c  <NA>  <NA>     d
df %>% mutate(name4 = ifelse(!is.na(name3), name3, name1), 
              name4=c('a'='b','c'='d','b'='b','d'='d')[name4])
#>   name1 name2 name3 name4
#> 1     a  <NA>  <NA>     b
#> 2     a     a     b     b
#> 3     a     a     b     b
#> 4     a  <NA>  <NA>     b
#> 5     c  <NA>  <NA>     d
#> 6     c     c     d     d
#> 7     c     c     d     d
#> 8     c  <NA>  <NA>     d

^{Created on 2021-09-28 by the reprex package (v2.0.1)}

score 1 · Accepted Answer · answered Sep 28 '21 at 09:46

fill the NAs, then use coalesce from right to left, getting the latest name for name4 column:

df %>% 
  group_by(name1) %>% 
  fill(name2, name3, .direction = "downup") %>% 
  mutate(name4 = coalesce(name3, name2, name1))

## A tibble: 8 x 4
## Groups:   name1 [2]
#  name1 name2 name3 name4
#  <chr> <chr> <chr> <chr>
#1 a     a     b     b    
#2 a     a     b     b    
#3 a     a     b     b    
#4 a     a     b     b    
#5 c     c     d     d    
#6 c     c     d     d    
#7 c     c     d     d    
#8 c     c     d     d

Get the latest updated value from last non-na column

2 Answers2