2

I have a data frame that looks like this:

zz = "Sub Item Answer
1      A   1    NA
2      A   1    0
3      A   2    NA
4      A   2    1
5      B   1    NA
6      B   1    1
7      B   2    NA
8      B   2    0"    
Data = read.table(text=zz, header = TRUE)

The desirable result is to have the value under "Answer" (0 or 1) copied to the NA cells of the same Subject and the same Item. For instance, the answer = 0 in row 2 should be copied to the answer cell in row 1, but not other rows. The output should be like this:

zz2 = "Sub Item Answer
1      A   1    0
2      A   1    0
3      A   2    1
4      A   2    1
5      B   1    1
6      B   1    1
7      B   2    0
8      B   2    0"
Data2 = read.table(text=zz2, header = TRUE)

How should I do this? I noticed that there are some previous questions that asked how to copy a cell to other cells such as replace NA value with the group value, but it was done based on the value of one column only. Also, this question is slightly different from Replace missing values (NA) with most recent non-NA by group, which aims to copy the most-recent numeric value to NAs.

Thanks for all your answers!

EvelynZ
  • 63
  • 6
  • Possible duplicate of [Replace missing values (NA) with most recent non-NA by group](https://stackoverflow.com/questions/23340150/replace-missing-values-na-with-most-recent-non-na-by-group) – Maurits Evers Feb 22 '18 at 05:33
  • @MauritsEvers This seems to be a little different in that the link you are pointing to has NAs in the final output where as this question wants no NAs in the output. They are similar I agree but perhaps different enough. – steveb Feb 22 '18 at 05:47
  • 1
    What will happen if there are multiple rows for a `Sub & Item` with `non-NA` value? Which one should be copied in that case? – MKR Feb 22 '18 at 06:41
  • @MKR In my actual data, there will be only one row with non-NA value for every combination of Sub & Item rows. Thanks! – EvelynZ Feb 22 '18 at 15:54
  • @YiwenZhang Is it possible that both rows could have `NA`? – MKR Feb 22 '18 at 15:56
  • @MKR for each combination of Sub & Item, only one cell of row `Answer` would have numeric value; other cells are `NA`. – EvelynZ Feb 22 '18 at 18:21
  • Okay. Then any of the below answer will work for you. Thanks. – MKR Feb 22 '18 at 18:37

3 Answers3

2

You can use zoo::na.locf.

library(tidyverse);
library(zoo);
Data %>% group_by(Sub, Item) %>% mutate(Answer = na.locf(Answer));
# A tibble: 8 x 3
## Groups:   Sub, Item [4]
#  Sub    Item Answer
#  <fct> <int>  <int>
#1 A         1      0
#2 A         1      0
#3 A         2      1
#4 A         2      1
#5 B         1      1
#6 B         1      1
#7 B         2      0
#8 B         2      0

Thanks to @steveb, here is an alternative without having to rely on zoo::na.locf:

Data %>% group_by(Sub, Item) %>% mutate(Answer = Answer[!is.na(Answer)]);
Maurits Evers
  • 49,617
  • 4
  • 47
  • 68
0
library(tidyverse)
Data%>%group_by(Sub,Item)%>%fill(Answer,.direction = "up")
# A tibble: 8 x 3
# Groups:   Sub, Item [4]
     Sub  Item Answer
  <fctr> <int>  <int>
1      A     1      0
2      A     1      0
3      A     2      1
4      A     2      1
5      B     1      1
6      B     1      1
7      B     2      0
8      B     2      0
Onyambu
  • 67,392
  • 3
  • 24
  • 53
0

Though it was not intention of OP but I thought of situations where there are only NA values for set of Sub, Item group OR there multiple non-NA values for a group.

The one way to handle such situations could be by taking max/min of that group and ignoring max/min values if those are Inf

A solution could be:

library(dplyr)
Data %>% group_by(Sub, Item) %>% 
    mutate(Answer = ifelse(max(Answer, na.rm=TRUE)== -Inf, NA, 
                     as.integer(max(Answer, na.rm=TRUE))))

#Result
#  Sub     Item Answer
#  <fctr> <int>  <int>
#1 A          1      0
#2 A          1      0
#3 A          2      1
#4 A          2      1
#5 B          1      1
#6 B          1      1
#7 B          2      0
#8 B          2      0
MKR
  • 19,739
  • 4
  • 23
  • 33