1

In order to clean some of my datas, I have tried setting up an Excel VLOOKUP equivalent tu use with a dplyr::mutate() function :

vlookup <- 
  function(theString, theMapping) 
  {
    library("magrittr")
    
    which(
        theMapping[1] == theString
      )[1] %>% 
      theMapping[.,2]
  }

Testing on a single string works fine :

test_mapping <- 
  tibble(
      c("AD", "Andorra")
    , c("AE", "United Arab Emirates")
    , c("AF", "Afghanistan")
    ) %>% 
  t %>% 
  as_tibble

> "AD" %>% vlookup(test_mapping)
# A tibble: 1 x 1
  V2     
  <chr>  
1 Andorra

However, it doesn't work when applied on a tibble with mutate() :

tibble(
  country = c("AE", "AF", "foo")
) %>% 
mutate(
    country_remap = country %>% vlookup(test_mapping)
  )

> tibble(
+   country = c("AE", "AF", "foo")
+ ) %>% 
+ mutate(
+     country_remap = country %>% vlookup(test_mapping)
+   )
# A tibble: 3 x 2
  country country_remap$V2
  <chr>   <chr>           
1 AE      NA              
2 AF      NA              
3 foo     NA 

Any idea to solve this?

Frederi ROSE
  • 297
  • 2
  • 9

2 Answers2

2

We can use deframe to create a named vector and remap more efficiently

library(dplyr)
library(tibble)
tibble(
   country = c("AE", "AF", "foo")
 ) %>% mutate(country_remap = deframe(test_mapping)[country])
# A tibble: 3 x 2
#  country country_remap       
#  <chr>   <chr>               
#1 AE      United Arab Emirates
#2 AF      Afghanistan         
#3 foo     <NA>     
akrun
  • 874,273
  • 37
  • 540
  • 662
0

Your vlookup function is not vectorized meaning it works for one value at a time. If you add rowwise in your pipe it works as expected.

library(dplyr)
library(tibble)

tibble(country = c("AE", "AF", "foo")) %>% 
  rowwise() %>%
  mutate(country_remap = country %>% vlookup(test_mapping))

# country country_remap$V2    
#  <chr>   <chr>               
#1 AE      United Arab Emirates
#2 AF      Afghanistan         
#3 foo     NA               

However, this is not efficient. You should instead join the dataframes in R.

tibble(country = c("AE", "AF", "foo")) %>%
  left_join(test_mapping, by = c('country' = 'V1'))
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213