0

Consider the following data:

library(tidyverse)
GameID <- c('Bos001', 'Bos002', 'Bos003', 'Pit001', 'Pit002', 'Pit003')
Stadium <- c("Fenway Park", NA, "Fenway Park", NA, NA, "PNC Park")
GameData <- data.frame(GameID, Stadium)
GameData

GameID     Stadium
1 Bos001 Fenway Park
2 Bos002        <NA>
3 Bos003 Fenway Park
4 Pit001        <NA>
5 Pit002        <NA>
6 Pit003    PNC Park

The Stadium column has a relationship with the GameID column. In this contrived example:

  • All rows where GameID begins with "Bos", the Stadium column value should be "Fenway Park".
  • All rows where GameID begins with "Pit", the Stadium column value should be "PNC Park".

Tidied Data:

GameID     Stadium
1 Bos001 Fenway Park
2 Bos002 Fenway Park
3 Bos003 Fenway Park
4 Pit001    PNC Park
5 Pit002    PNC Park
6 Pit003    PNC Park

How can I fill in these values?

Should I use a combination of dplyr:arrange() and tidyr:fill()?

Jim G.
  • 15,141
  • 22
  • 103
  • 166
  • 1
    Group by the first three characters of `GameID`, you might use `substr`. Then `replace` `NA`s with non-`NA`s – markus Mar 01 '19 at 19:25
  • @markus: I think you're right and that doesn't seem too complicated, but I'm unfamiliar with that code. Can you please point me in the right direction? – Jim G. Mar 01 '19 at 19:34
  • 1
    Jim, sorry. Look at zack's answer which covers the idea. – markus Mar 01 '19 at 19:36

3 Answers3

2

Using @markus' suggestion from comments, will happily remove answer if they submit answer:

library(tidyverse)

GameData %>%
  group_by(GamePrefix = substr(GameID, 1, 3)) %>%
  mutate(Stadium = first(Stadium[!is.na(Stadium)])) %>%
  ungroup() %>%
  select(-GamePrefix)

# A tibble: 6 x 2
  GameID Stadium    
  <fct>  <fct>      
1 Bos001 Fenway Park
2 Bos002 Fenway Park
3 Bos003 Fenway Park
4 Pit001 PNC Park   
5 Pit002 PNC Park   
6 Pit003 PNC Park 

The ungroup is needed in order to remove the temporary grouping column which consists of the first 3 characters of the GameID values.

zack
  • 5,205
  • 1
  • 19
  • 25
0

You can use multiple if or a lookup dataframe ; something like this:

> GameData %>%
+   mutate(Stadium = ifelse(grepl("^Bos", GameID), "Fenway Park", 
+                           ifelse(grepl("^Pit", GameID), "PNC Park", NA))
+          )
  GameID     Stadium
1 Bos001 Fenway Park
2 Bos002 Fenway Park
3 Bos003 Fenway Park
4 Pit001    PNC Park
5 Pit002    PNC Park
6 Pit003    PNC Park
Sonny
  • 3,083
  • 1
  • 11
  • 19
0

You are on the right track and this worked for me:

gd2 <- GameData %>% 
mutate(nev =substr(GameID,1,3)) %>%
arrange(Stadium) %>% 
group_by(nev) %>% 
fill(Stadium)
Mike
  • 3,797
  • 1
  • 11
  • 30