How to copy value of a cell to other rows based on the value of other two columns?

Question

I have a data frame that looks like this:

zz = "Sub Item Answer
1      A   1    NA
2      A   1    0
3      A   2    NA
4      A   2    1
5      B   1    NA
6      B   1    1
7      B   2    NA
8      B   2    0"    
Data = read.table(text=zz, header = TRUE)

The desirable result is to have the value under "Answer" (0 or 1) copied to the NA cells of the same Subject and the same Item. For instance, the answer = 0 in row 2 should be copied to the answer cell in row 1, but not other rows. The output should be like this:

zz2 = "Sub Item Answer
1      A   1    0
2      A   1    0
3      A   2    1
4      A   2    1
5      B   1    1
6      B   1    1
7      B   2    0
8      B   2    0"
Data2 = read.table(text=zz2, header = TRUE)

How should I do this? I noticed that there are some previous questions that asked how to copy a cell to other cells such as replace NA value with the group value, but it was done based on the value of one column only. Also, this question is slightly different from Replace missing values (NA) with most recent non-NA by group, which aims to copy the most-recent numeric value to NAs.

Thanks for all your answers!

Possible duplicate of [Replace missing values (NA) with most recent non-NA by group](https://stackoverflow.com/questions/23340150/replace-missing-values-na-with-most-recent-non-na-by-group) — Maurits Evers, Feb 22 '18 at 05:33
@MauritsEvers This seems to be a little different in that the link you are pointing to has NAs in the final output where as this question wants no NAs in the output. They are similar I agree but perhaps different enough. — steveb, Feb 22 '18 at 05:47
What will happen if there are multiple rows for a `Sub & Item` with `non-NA` value? Which one should be copied in that case? — MKR, Feb 22 '18 at 06:41
@MKR In my actual data, there will be only one row with non-NA value for every combination of Sub & Item rows. Thanks! — EvelynZ, Feb 22 '18 at 15:54
@MKR for each combination of Sub & Item, only one cell of row `Answer` would have numeric value; other cells are `NA`. — EvelynZ, Feb 22 '18 at 18:21
Okay. Then any of the below answer will work for you. Thanks. — MKR, Feb 22 '18 at 18:37

Maurits Evers · Accepted Answer · 2018-02-22T05:45:55.737

2

You can use zoo::na.locf.

library(tidyverse);
library(zoo);
Data %>% group_by(Sub, Item) %>% mutate(Answer = na.locf(Answer));
# A tibble: 8 x 3
## Groups:   Sub, Item [4]
#  Sub    Item Answer
#  <fct> <int>  <int>
#1 A         1      0
#2 A         1      0
#3 A         2      1
#4 A         2      1
#5 B         1      1
#6 B         1      1
#7 B         2      0
#8 B         2      0

Thanks to @steveb, here is an alternative without having to rely on zoo::na.locf:

Data %>% group_by(Sub, Item) %>% mutate(Answer = Answer[!is.na(Answer)]);

edited Feb 22 '18 at 05:45

answered Feb 22 '18 at 05:31

Maurits Evers

49,617
4
47
68

1

My answer is too close to yours so I am suggesting an alternative that doesn't require the `zoo` library: `Data %>% group_by(Sub, Item) %>% mutate(Answer = Answer[!is.na(Answer)])`. – steveb Feb 22 '18 at 05:35
1

@steveb That's a neat way! Never seen this. Thanks. – Maurits Evers Feb 22 '18 at 05:39
If you want to add it as an alternative, feel free to. – steveb Feb 22 '18 at 05:44
@steveb Done;-) – Maurits Evers Feb 22 '18 at 05:46
Depending on the input data, Answer may have more than 1 non-NA value per group, the max or some single value can be chosen in that case. – steveb Feb 22 '18 at 05:51

score 0 · Answer 2 · answered Feb 22 '18 at 05:53

library(tidyverse)
Data%>%group_by(Sub,Item)%>%fill(Answer,.direction = "up")
# A tibble: 8 x 3
# Groups:   Sub, Item [4]
     Sub  Item Answer
  <fctr> <int>  <int>
1      A     1      0
2      A     1      0
3      A     2      1
4      A     2      1
5      B     1      1
6      B     1      1
7      B     2      0
8      B     2      0

score 0 · Answer 3 · answered Feb 22 '18 at 07:10

Though it was not intention of OP but I thought of situations where there are only NA values for set of Sub, Item group OR there multiple non-NA values for a group.

The one way to handle such situations could be by taking max/min of that group and ignoring max/min values if those are Inf

A solution could be:

library(dplyr)
Data %>% group_by(Sub, Item) %>% 
    mutate(Answer = ifelse(max(Answer, na.rm=TRUE)== -Inf, NA, 
                     as.integer(max(Answer, na.rm=TRUE))))

#Result
#  Sub     Item Answer
#  <fctr> <int>  <int>
#1 A          1      0
#2 A          1      0
#3 A          2      1
#4 A          2      1
#5 B          1      1
#6 B          1      1
#7 B          2      0
#8 B          2      0

How to copy value of a cell to other rows based on the value of other two columns?

3 Answers3

Linked