0

Situation:

I have a large dataset with a column for country code (country) and next to it a column with the country name (country_name). I need help replacing the NA values for the two observations in the country column with NA in the corresponding country_name column:

  1. EL
  2. UK

EL should have Greece in the country_name column. UK should have United Kingdom in the country_name column.

I am in the process of tidying the dataset, so the solution would be used in my cleaning for the dataset as a whole. Obviously, I would like to keep the remaining columns as is given that the country column has the correct information in the country_name column.

Reproducible Data:

    structure(list(country = c("EL", "EL", "EL", "EL", "EL", "UK", 
"UK", "UK", "UK", "UK"), country_name = c(NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_), type = c("Conventional thermal", 
"Conventional thermal", "Conventional thermal", "Nuclear", "Nuclear", 
"Conventional thermal", "Conventional thermal", "Conventional thermal", 
"Nuclear", "Nuclear")), row.names = c(NA, -10L), class = c("tbl_df", 
"tbl", "data.frame"))
Henrik
  • 65,555
  • 14
  • 143
  • 159
Eric
  • 2,699
  • 5
  • 17
  • What have you tried and where have you found issues? Also the post is tagged with both `dplyr` and `data.table`. Do you prefer either package? Would base R solution work? – s_baldur Aug 04 '20 at 13:29

1 Answers1

2

For large data sets I would recommend working with a mapping file:

country_map <- tibble(country = c("EL", "UK"), 
                      country_name2 = c("Greece", "United Kingdom"))

The mapping file contains all countries and the corresponding names. You can then join the mapping file to your data and use coalesce to update country names.

data %>% left_join(country_map, by = "country") %>%
  mutate(country_name = coalesce(country_name, country_name2)) %>%
  select(-country_name2)
Cettt
  • 11,460
  • 7
  • 35
  • 58