0

I have this series of nested statements

data$Country == 1,"Brazil",
  ifelse(data$Country == 2, "Canada",ifelse(
    data$Country == 3, "China",ifelse(
      data$Country == 4, "Ecuador",ifelse(
        data$Country == 5, "France",ifelse(
          data$Country == 6, "Germany",ifelse(
            data$Country == 7, "India",ifelse(
              data$Country == 8, "Italy",ifelse(
                data$Country == 9, "Mexico",ifelse(
                  data$Country == 10, "Nigeria",ifelse(
                    data$Country == 11, "Poland",ifelse(
                      data$Country == 12, "Russia",ifelse(
                        data$Country == 13, "South Africa",
                        ifelse(
                          data$Country == 14, "South Korea",ifelse(
                            data$Country == 15, "Singapore",
                            ifelse(
                              data$Country == 16, "Spain",
                              ifelse(
                                data$Country == 17, "Sweden",ifelse(
                                  data$Country == 18, "United Kingdom",ifelse(
                                    data$Country == 19, "United States","l"
))))))))))))))))))))

I was looking for the quickest way to convert any encoded variable into the respective Country name. Do you think is there a way to cope with this operation?

Thank you so much

12666727b9
  • 1,133
  • 1
  • 8
  • 22
  • 1
    Usually done with `merge`. – user2974951 Nov 19 '21 at 09:39
  • Can you show a quick example? – 12666727b9 Nov 19 '21 at 09:40
  • Have a look at [How to join (merge) data frames (inner, outer, left, right)](https://stackoverflow.com/questions/1299871/how-to-join-merge-data-frames-inner-outer-left-right/1300618) – user2974951 Nov 19 '21 at 09:41
  • Cannot see why. The example seems regard mostly data merging – 12666727b9 Nov 19 '21 at 09:43
  • The points was that this would be much better done with a merge, rather than using such a sausage for a query. You match by merging. – user2974951 Nov 19 '21 at 09:44
  • 1
    `dplyr::case_when` removes the need for nested `ifelse` statements. or `merge` as @user2974951 mentioned. – phiver Nov 19 '21 at 09:45
  • Ok. This `dplyr::case_when` is another alternative. Since I'm at the very beginning could please sketch out a random example here following, so that would make the understanding easier – 12666727b9 Nov 19 '21 at 09:47
  • The easiest way is to create something similar to a dictionary, and then extract the country name. It is not clear however how you need to use this in your code and what you mean for "quickest". If it is an operation that is done several (e.g. thousand) of times, it is probably not the most efficient solution, since it has complexity O(n) - and not O(1) like python dictionaries. countries <- list("Italy" = 1, "France" = 2, "Germany" = 3) country.name <- names(which(countries == data$Country) – DrunkenDuck Nov 19 '21 at 09:56
  • I mean to write the same with as less rows as possible – 12666727b9 Nov 19 '21 at 10:05
  • @DrunkenDuck If you want, I kindly invite you to sketch out below the solution you reckon the easiest to apply to a case like this – 12666727b9 Nov 19 '21 at 10:07
  • 1
    `country_names = c("Brazil", "Canada", "China"); country_names[data$Country]`. – Martin Morgan Nov 19 '21 at 10:30

3 Answers3

1

I am not sure of the desired use. But maybe you can try to use a named vector. This is not the most elegant solution, though it solves the ifelse mess ;)

An example of 4 countries. China = "4"

countrys <- c("Brazil", "Canada",
              "China",
              "Ecuador")
names(countrys) <- c(2:5)

# Test data.frame
data <- data.frame(country = 4)

# Now we can get the country directly from the data$country:
# Careful! 4 is not '4'
unname(countrys[as.character(data$country)])  
Lucas
  • 409
  • 2
  • 10
1

There are 2 options:

1: case_when from dplyr

library(dplyr)
data.frame(info = letters[1:5],
           country_id = 1:5) %>% 
  mutate(country_name = case_when(country_id == 1 ~ "Brazil",
                                  country_id == 2 ~ "Canada",
                                  country_id == 3 ~ "China",
                                  country_id == 4 ~ "Ecuador",
                                  country_id == 5 ~ "France",
                                  TRUE ~ "Unknown"))

  info country_id country_name
1    a          1       Brazil
2    b          2       Canada
3    c          3        China
4    d          4      Ecuador
5    e          5       France

2: merge or join the info from a country table:

# country table
countries <- data.frame(country_id = 1:5, 
                        country_name = c("Brazil", "Canada", "China", "Ecuador", "France"))

data.frame(info = letters[1:5],
           country_id = 1:5) %>% 
  left_join(countries, by = "country_id")

  info country_id country_name
1    a          1       Brazil
2    b          2       Canada
3    c          3        China
4    d          4      Ecuador
5    e          5       France

My preference would be 2, less coding and less chance of a mistake. You can keep the country table in your database or in a file somewhere and maintain that without needing to change the code.

phiver
  • 23,048
  • 14
  • 44
  • 56
0

This is a very nice case for a switch statement, which in my opinion makes for more readable code than dplyr::case_when or a series of ifelse, and is easily extendible, if for example there are further criteria like Region, Cities etc.

get_country <- Vectorize(function(x){
  switch(as.character(x),
         "1" = "Brazil", "2" = "Canada", "3" = "China", "4" = "Ecuador",
         "5" = "France", "6" = "Germany", "7" = "India", "8" = "Italy", 
         "9" = "Mexico", "10" = "Nigeria", "11" = "Poland", "12" = "Russia",
         "13" = "South Africa", "14" = "South Korea", "15" = "Singapore",
         "16" = "Spain", "17" = "Sweden", "18" = "United Kingdom", "19" = "United States",
         NA)
})

data.frame(info = letters[1:5],
           country_id = 1:5) %>%
  mutate(country = get_country(country_id))

  info country_id  country
1    a          1  Brazil
2    b          2  Canada
3    c          3   China
4    d          4 Ecuador
5    e          5  France

But a long statement like that is a lot of work to type. Alternatively, a more dynamic approach, we can create a switch statement using a constructor function that takes vectors of input. Here I use the ISO3166 data set found in the maps package to create expression of 269 countries.

constructor <- function(ids, names){
  purrr::imap_chr(as.character(ids), ~paste(paste0("\"", .x ,"\""),
                                            paste0("\"", names[.y], "\""),
                                            sep = "=")) %>%
    paste0(collapse = ", ") %>%
    paste0("Vectorize(function(x) switch(as.character(x), ", ., ", NA))", collapse = "") %>%
    str2expression()
}
get_country <- eval(constructor(1:149, trimws(rworldmap::countryExData$Country)))

set.seed(1)
data.frame(info = sample(letters, size = 5, replace = T),
           country_id = sample.int(149, 5, replace = T)) %>%
  mutate(country = get_country(country_id))

  info country_id           country
1    y        122      Sierra Leone
2    h         39           Algeria
3    l         42           Eritrea
4    y        134 Trinidad & Tobago
5    w         24             Chile

To show the power - lets create a control flow for ~20 000 cities with just 2 more lines of code

CITIES <- maps::world.cities %>% filter(pop > 10000) %>% arrange(desc(pop))
get_city <- eval(constructor(1:nrow(CITIES), trimws(CITIES$name)))

data.frame(city_id = sample.int(23255, size = 100, replace = T),
           country_id = sample.int(269, 100, replace = T)) %>%
  mutate(country = get_country(country_id),
         city = get_city(city_id))

One benefit of this approach is you can easily optimize your control flow by making sure the input vector to the constructor function is optimized, that is, most occurring cases first, and possibly make functions that support a nested approach like get_continent(get_country(get_city))).

Donald Seinen
  • 4,179
  • 5
  • 15
  • 40