How do I fill in missing values based on the values in other rows?

Question

country <- c("USA","UK","Egypt","Brazil","USA", "UK")
continent <- c("North America","Europe","Africa","South America", NA, NA)
data.frame(country, continent)

How do I automatically impute those last two NAs based on the previous rows since we know that "USA" goes with "North America" and "UK" goes with "Europe"?

I have a large dataset so it would be super helpful if I could find a quick way to do this in dplyr when there are many NAs.

Thank you in advance!

score 3 · Accepted Answer · answered Jul 27 '21 at 18:26

3

We can do a group by fill

library(dplyr)
library(tidyr)
d1 %>% 
     group_by(country) %>% 
     fill(continent) %>% 
     ungroup

-output

# A tibble: 6 x 2
  country continent    
  <chr>   <chr>        
1 USA     North America
2 UK      Europe       
3 Egypt   Africa       
4 Brazil  South America
5 USA     North America
6 UK      Europe

answered Jul 27 '21 at 18:26

akrun

874,273
37
540
662

Thank you so much, appreciate the response. Just as a follow-up, how would I input a missing value manually if there isn't a corresponding observation for `continent` and `country`? – sergio_ag Jul 27 '21 at 18:42
@krmo you can use `complete` to expand the data for missing cases – akrun Jul 27 '21 at 18:44
How would I do this for a specific row? Say, I wanted to add "Asia" to a row that currently has: "China", NA. Thank you again @akrun! – sergio_ag Jul 27 '21 at 18:58
@krmo do you have a case where the value is "Asia" for `China` or is it a new entry – akrun Jul 27 '21 at 19:14

score 2 · Answer 2 · answered Jul 27 '21 at 18:49

We could use na.locf from zoo package:

library(dplyr)
library(zoo)
df1 %>% 
    mutate(id = row_number()) %>% 
    group_by(country) %>% 
    do(na.locf(.)) %>% 
    arrange(id) %>% 
    select(-id) %>% 
    ungroup

Output:

  country continent    
  <chr>   <chr>        
1 USA     North America
2 UK      Europe       
3 Egypt   Africa       
4 Brazil  South America
5 USA     North America
6 UK      Europe

How do I fill in missing values based on the values in other rows?

2 Answers2