3
country <- c("USA","UK","Egypt","Brazil","USA", "UK")
continent <- c("North America","Europe","Africa","South America", NA, NA)
data.frame(country, continent) 

How do I automatically impute those last two NAs based on the previous rows since we know that "USA" goes with "North America" and "UK" goes with "Europe"?

I have a large dataset so it would be super helpful if I could find a quick way to do this in dplyr when there are many NAs.

Thank you in advance!

sergio_ag
  • 49
  • 4

2 Answers2

3

We can do a group by fill

library(dplyr)
library(tidyr)
d1 %>% 
     group_by(country) %>% 
     fill(continent) %>% 
     ungroup

-output

# A tibble: 6 x 2
  country continent    
  <chr>   <chr>        
1 USA     North America
2 UK      Europe       
3 Egypt   Africa       
4 Brazil  South America
5 USA     North America
6 UK      Europe     
akrun
  • 874,273
  • 37
  • 540
  • 662
  • Thank you so much, appreciate the response. Just as a follow-up, how would I input a missing value manually if there isn't a corresponding observation for `continent` and `country`? – sergio_ag Jul 27 '21 at 18:42
  • @krmo you can use `complete` to expand the data for missing cases – akrun Jul 27 '21 at 18:44
  • How would I do this for a specific row? Say, I wanted to add "Asia" to a row that currently has: "China", NA. Thank you again @akrun! – sergio_ag Jul 27 '21 at 18:58
  • @krmo do you have a case where the value is "Asia" for `China` or is it a new entry – akrun Jul 27 '21 at 19:14
2

We could use na.locf from zoo package:

library(dplyr)
library(zoo)
df1 %>% 
    mutate(id = row_number()) %>% 
    group_by(country) %>% 
    do(na.locf(.)) %>% 
    arrange(id) %>% 
    select(-id) %>% 
    ungroup

Output:

  country continent    
  <chr>   <chr>        
1 USA     North America
2 UK      Europe       
3 Egypt   Africa       
4 Brazil  South America
5 USA     North America
6 UK      Europe     
TarJae
  • 72,363
  • 6
  • 19
  • 66