How can I fill NAs in one column when that column has a relationship with another column?

Question

Consider the following data:

library(tidyverse)
GameID <- c('Bos001', 'Bos002', 'Bos003', 'Pit001', 'Pit002', 'Pit003')
Stadium <- c("Fenway Park", NA, "Fenway Park", NA, NA, "PNC Park")
GameData <- data.frame(GameID, Stadium)
GameData

GameID     Stadium
1 Bos001 Fenway Park
2 Bos002        <NA>
3 Bos003 Fenway Park
4 Pit001        <NA>
5 Pit002        <NA>
6 Pit003    PNC Park

The Stadium column has a relationship with the GameID column. In this contrived example:

All rows where GameID begins with "Bos", the Stadium column value should be "Fenway Park".
All rows where GameID begins with "Pit", the Stadium column value should be "PNC Park".

Tidied Data:

GameID     Stadium
1 Bos001 Fenway Park
2 Bos002 Fenway Park
3 Bos003 Fenway Park
4 Pit001    PNC Park
5 Pit002    PNC Park
6 Pit003    PNC Park

How can I fill in these values?

Should I use a combination of dplyr:arrange() and tidyr:fill()?

Group by the first three characters of `GameID`, you might use `substr`. Then `replace` `NA`s with non-`NA`s — markus, Mar 01 '19 at 19:25
@markus: I think you're right and that doesn't seem too complicated, but I'm unfamiliar with that code. Can you please point me in the right direction? — Jim G., Mar 01 '19 at 19:34

score 2 · Accepted Answer · answered Mar 01 '19 at 19:34

Using @markus' suggestion from comments, will happily remove answer if they submit answer:

library(tidyverse)

GameData %>%
  group_by(GamePrefix = substr(GameID, 1, 3)) %>%
  mutate(Stadium = first(Stadium[!is.na(Stadium)])) %>%
  ungroup() %>%
  select(-GamePrefix)

# A tibble: 6 x 2
  GameID Stadium    
  <fct>  <fct>      
1 Bos001 Fenway Park
2 Bos002 Fenway Park
3 Bos003 Fenway Park
4 Pit001 PNC Park   
5 Pit002 PNC Park   
6 Pit003 PNC Park

The ungroup is needed in order to remove the temporary grouping column which consists of the first 3 characters of the GameID values.

score 0 · Answer 2 · answered Mar 01 '19 at 19:31

You can use multiple if or a lookup dataframe ; something like this:

> GameData %>%
+   mutate(Stadium = ifelse(grepl("^Bos", GameID), "Fenway Park", 
+                           ifelse(grepl("^Pit", GameID), "PNC Park", NA))
+          )
  GameID     Stadium
1 Bos001 Fenway Park
2 Bos002 Fenway Park
3 Bos003 Fenway Park
4 Pit001    PNC Park
5 Pit002    PNC Park
6 Pit003    PNC Park

score 0 · Answer 3 · answered Mar 01 '19 at 19:34

0

You are on the right track and this worked for me:

gd2 <- GameData %>% 
mutate(nev =substr(GameID,1,3)) %>%
arrange(Stadium) %>% 
group_by(nev) %>% 
fill(Stadium)

answered Mar 01 '19 at 19:34

Mike

3,797
1
11
30

How can I fill NAs in one column when that column has a relationship with another column?

3 Answers3