Fill missing values by finding matches in another column

Question

# generate data
df <- data.frame(
  QuestionId = c(rep(NA, 16)),
  AltQuestionId = c(1, 2, 4, 5, 6, NA, 8, 10, NA, NA, 14, NA, 16, NA, 18, 20),
  AltTakerId = c(7, 13, 10, 15, 17, NA, 8, 11, NA, NA, 25, NA, 29, NA, 35, 29)
)
df$QuestionId[c(6, 9, 10, 12, 14)] <- c(1, 6, 2, 6, 4)
df$TakerId <- NA # a column of NAs

I have no idea how to fill the TakerID column as explained in the Figure above.

The variables QuestionID and AltQuestionID are the same. Also the variables TakerID and AltTakerID are the same.

The purpose is to associate QuestionID with TakerID.

Output wanted:

> df
   QuestionId AltQuestionId AltTakerId TakerId
1          NA             1          7      NA
2          NA             2         13      NA
3          NA             4         10      NA
4          NA             5         15      NA
5          NA             6         17      NA
6           1            NA         NA       7
7          NA             8          8      NA
8          NA            10         11      NA
9           6            NA         NA      17
10          2            NA         NA      13
11         NA            14         25      NA
12          6            NA         NA      17
13         NA            16         29      NA
14          4            NA         NA      10
15         NA            18         35      NA
16         NA            20         29      NA

When asking for help, it's better if you provide a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and the desired output for that input. Pictures of data aren't particularly helpful because we don't want to re-type it all just to test. — MrFlick, Jan 15 '18 at 16:05
@Phil pairwise equality testing won't work for this. @giulio use `match()`. — Gregor Thomas, Jan 15 '18 at 16:21
A few notes on using Stack Overflow: (1) it's good to wait a while before accepting an answer (a couple hours to a day) so you can see what else comes along. (2) Try to describe your question in the title. You probably wouldn't have drawn as many downvotes with a title like *"Fill missing values by finding matches in another column"*, and (3) share data in a copy/pasteable way. The data that Milan created in his answer (and I used) should have been created by you and posted in the question. That way potential answerers have a good place to start. (This also would have kept the downvotes away) — Gregor Thomas, Jan 15 '18 at 18:59
@Gregor thank you for the suggestions, I'm new to StackOverflow. — Giulio, Jan 15 '18 at 22:40

score 0 · Answer 1 · edited Feb 24 '22 at 01:22

I don't know how you want to deal with the fact that there are duplicate values in QuestionId (two 6s) but if

a) there are as many entries in AltQuestionID for each unique QuestionId value as there are instances of the given QuestionId value, or
b) there is only one instance of each unique QuestionId value in AltQuestionId and you want TakerId to duplicate the AltTakerId for QuestionId duplicates, then

this should work:

# generate data
df <- data.frame(
  QuestionId = c(rep(NA, 16)),
  AltQuestionId = c(1, 2, 4, 5, 6, NA, 8, 10, NA, NA, 14, NA, 16, NA, 18, 20),
  AltTakerId = c(7, 13, 10, 15, 17, NA, 8, 11, NA, NA, 25, NA, 29, NA, 35, 29)
)
df$QuestionId[c(6, 9, 10, 12, 14)] <- c(1, 6, 2, 6, 4)
df$TakerId <- NA # a column of NAs

# for each unique value of QuestionId (i) put the value of AltTakerId
# that corresponds to the row where AltQuestionId equals i into the row
# of TakerId for which QuestionId equals i
for (i in na.omit(unique(df$QuestionId))) {
  df$TakerId[which(df$QuestionId == i)] <- df$AltTakerId[which(df$AltQuestionId == i)]
}

This gives:

> df
   QuestionId AltQuestionId AltTakerId TakerId
1          NA             1          7      NA
2          NA             2         13      NA
3          NA             4         10      NA
4          NA             5         15      NA
5          NA             6         17      NA
6           1            NA         NA       7
7          NA             8          8      NA
8          NA            10         11      NA
9           6            NA         NA      17
10          2            NA         NA      13
11         NA            14         25      NA
12          6            NA         NA      17
13         NA            16         29      NA
14          4            NA         NA      10
15         NA            18         35      NA
16         NA            20         29      NA

score 0 · Accepted Answer · answered Jan 15 '18 at 17:29

Here's a one-liner with match:

df$TakerId = df$AltTakerId[match(df$QuestionId, df$AltQuestionId)]
df
#    QuestionId AltQuestionId AltTakerId TakerId
# 1          NA             1          7      NA
# 2          NA             2         13      NA
# 3          NA             4         10      NA
# 4          NA             5         15      NA
# 5          NA             6         17      NA
# 6           1            NA         NA       7
# 7          NA             8          8      NA
# 8          NA            10         11      NA
# 9           6            NA         NA      17
# 10          2            NA         NA      13
# 11         NA            14         25      NA
# 12          6            NA         NA      17
# 13         NA            16         29      NA
# 14          4            NA         NA      10
# 15         NA            18         35      NA
# 16         NA            20         29      NA

Using Milan's nicely provided data:

df <- data.frame(
  QuestionId = c(rep(NA, 16)),
  AltQuestionId = c(1, 2, 4, 5, 6, NA, 8, 10, NA, NA, 14, NA, 16, NA, 18, 20),
  AltTakerId = c(7, 13, 10, 15, 17, NA, 8, 11, NA, NA, 25, NA, 29, NA, 35, 29)
)
df$QuestionId[c(6, 9, 10, 12, 14)] <- c(1, 6, 2, 6, 4)

Nice one, although maybe mention that this is not going to work if `AltQuestionId != unique(AltQuestionId)` or if `is.na(AltQuestionId) != is.na(AltTakerId)`? — Milan Valášek, Jan 16 '18 at 09:19
Since the what should be done in those cases isn't at all clear, I'll just leave it as is. — Gregor Thomas, Jan 16 '18 at 14:03

Fill missing values by finding matches in another column

2 Answers2