1

Imagine you have a large dataset similar to the matrix M below:

M <- data.frame(code = c("001", "001", "002", "002", "003", "003"), 
                decr = c("x", NA, "y", "y", NA, "z"))

# M

#   code decr
# 1  001    x
# 2  001 <NA>
# 3  002    y
# 4  002    y
# 5  003 <NA>
# 6  003    z

I would like to fulfill the NAs in the following intuitive form:

#   code decr
# 1  001    x
# 2  001    x
# 3  002    y
# 4  002    y
# 5  003    z
# 6  003    z

How would it be possible to do this transformation optimally?

And_R
  • 1,647
  • 3
  • 18
  • 32
  • 3
    Try `library(tidyverse); M %>% group_by(code) %>% fill(decr) %>% fill(decr, .direction = 'up')` – Sotos Dec 20 '17 at 14:48

1 Answers1

4
library(tidyverse)

M <- data.frame(code = c("001", "001", "002", "002", "003", "003"), 
                decr = c("x", NA, "y", "y", NA, "z"))

M %>% group_by(code) %>% mutate (decr=unique(na.omit(decr)))

This is similar to @Sotos answer, and that in the dupe, but it does not assume a particular location for the missing values.

John Paul
  • 12,196
  • 6
  • 55
  • 75
  • Nice. You should add this answer to the dupe as well – Sotos Dec 20 '17 at 14:54
  • 1
    Nice! I got something similar: `M %>% group_by(code) %>% mutate(decr = if_else(is.na(decr), unique(na.omit(decr)), decr))`. But I see now that the NA-check is obsolete if you just overwrite the whole group. – f.lechleitner Dec 20 '17 at 14:56
  • I guess this only works as expected if there's a 1 to 1 relation between both columns – talat Dec 20 '17 at 14:56
  • @docendodiscimus Yes - there has to be one to one. If not the `fill` method makes more sense - assuming the "good" value is always above or below the `NA`'s – John Paul Dec 20 '17 at 14:57
  • Just checked, dupe and this answer won't work - there is not a one ot one relationship between groups and values there. – John Paul Dec 20 '17 at 15:55