0

I have a dataframe in an excel sheet with some character values that I'd like to change to a different set of character values in R, however I have 184 different values across several columns that need to be changed to a different set of 184 values. The conversions between these values are listed in a vertical lookup table.

I can do a mutate with case_when, however this will take a very long time to write out all 184 values, and it's likely I will have to do this again indefinitely with other datasets of similar size. I would assume there was some way to do it by creating a lookup between two vectors of the same length?

Example data frame

df <- tibble(
  Var1 = c("","Label 3", "Label 184", "Label 4", ""),
  Var2 = c("","", "Label 1", "", "Label 2"),
  Var3 = c("Label 2","Label 184", "Label 1", "", "Label 4")
)

  Var1        Var2      Var3       
  <chr>       <chr>     <chr>      
1 ""          ""        "Label 2"  
2 "Label 3"   ""        "Label 184"
3 "Label 184" "Label 1" "Label 1"  
4 "Label 4"   ""        ""         
5 ""          "Label 2" "Label 4" 

Example lookup table

Lookup_table <- tibble(
  x = c("Label 1","Label 2","Label 3","Label 4","Label 184"), 
  y = c("NewLabel 1","NewLabel 2","NewLabel 3","NewLabel 4","NewLabel 184")
)

 x         y           
  <chr>     <chr>       
1 Label 1   NewLabel 1  
2 Label 2   NewLabel 2  
3 Label 3   NewLabel 3  
4 Label 4   NewLabel 4  
5 Label 184 NewLabel 184

Intended outcome

Var1           Var2         Var3          
  <chr>          <chr>        <chr>         
1 ""             ""           "NewLabel 2"  
2 "NewLabel 3"   ""           "NewLabel 184"
3 "NewLabel 184" "NewLabel 1" "NewLabel 1"  
4 "NewLabel 4"   ""           ""            
5 ""             "NewLabel 2" "NewLabel 4"  

1 Answers1

0

You can get the data in long format, join it with Lookup_table, replace NA with blank value and get data back in wide format.

library(dplyr)
library(tidyr)

df %>%
  mutate(row = row_number()) %>%
  pivot_longer(cols = -row) %>%
  left_join(Lookup_table, by = c('value' = 'x')) %>%
  mutate(y = replace_na(y, '')) %>%
  select(-value) %>%
  pivot_wider(names_from = name, values_from = y) %>%
  select(-row)

In base R you can use lapply with match :

df[] <- lapply(df, function(x) {
  tmp <- Lookup_table$y[match(x, Lookup_table$x)]
  replace(tmp, is.na(tmp), '')
})

df
# A tibble: 5 x 3
#  Var1           Var2         Var3          
#  <chr>          <chr>        <chr>         
#1 ""             ""           "NewLabel 2"  
#2 "NewLabel 3"   ""           "NewLabel 184"
#3 "NewLabel 184" "NewLabel 1" "NewLabel 1"  
#4 "NewLabel 4"   ""           ""            
#5 ""             "NewLabel 2" "NewLabel 4"  
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • This worked with some editing. My whole data frame included other numerical variables that didn't work when it tried to pivot longer. I just selected the columns of interest applied the transformations on that and will left bind it to the original dataframe to get around that issue – David Pettifer Jan 18 '21 at 12:16