0

Using OECD data, I can retrieve a database where variables are specified by their IDs and a list with the corresponding labels. Here is a minimal example that reproduces the data structure:

df <- tibble(LOCATION=c("DEU","ITA","USA"),UNIT=c("DEU","EUR","USD"),value=c(120,140,160))
df
## A tibble: 3 x 3
#> LOCATION UNIT  value
#> <chr>    <chr> <dbl>
#> 1 DEU      DEU     120
#> 2 ITA      EUR     140
#> 3 USA      USD     160

df_labels <- list(LOCATION = data.frame(id =c("DEU","ITA","USA"), 
                                        label=c("Germany","Italy","United States")), 
                  UNIT = data.frame(id=c("USD","EUR"),
                                    label=c("Dollar","Euro")))
df_labels
#> $LOCATION
#>    id         label
#> 1 DEU       Germany
#> 2 ITA         Italy
#> 3 USA United States
#> 
#> $UNIT
#>    id  label
#> 1 USD Dollar
#> 2 EUR   Euro

What I want to do is to replace the IDs in variables LOCATION and UNIT in df with the corresponding labels provided in df_labels. I defined the following function:

get_labels <- function(x,db) {
     variable = deparse(substitute(x))
     return(factor(x,levels=db[[variable]]$id,labels=db[[variable]]$label))
     }

so that I can use it in mutate as follows:

df %>% mutate(LOCATION = get_labels(LOCATION,df_labels),
              UNIT = get_labels(UNIT,df_labels))
## A tibble: 3 x 3
#>   LOCATION      UNIT   value
#>   <fct>         <fct>  <dbl>
#> 1 Germany       Euro     120
#> 2 Italy         Euro     140
#> 3 United States Dollar   160

However, I haven't been able to use the function across multiple columns. If I try it using across:

df %>% mutate(across(where(is.character), ~get_labels(.,df_labels)))

the result is an NA in the affected columns. Apparently, the problem is with deparse(substitute(.)), which does not capture the column names. Unfortunately, looking at similar questions such as this one didn't help.

rjen
  • 1,938
  • 1
  • 7
  • 19
Massimo2013
  • 533
  • 4
  • 17

2 Answers2

1

across will not have access to column names so deparse, substitute trick will not work. Try this version :

get_labels <- function(x, y, db) {
  return(factor(x,levels=db[[y]]$id,labels=db[[y]]$label))
}

cols <- sapply(df, is.character)
df[cols] <- purrr::imap_dfc(df[cols], get_labels, db = df_labels)
df
# A tibble: 3 x 3
#  LOCATION      UNIT   value
#  <fct>         <fct>  <dbl>
#1 Germany       NA       120
#2 Italy         Euro     140
#3 United States Dollar   160
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • It works, thank you. I think I must study how ```purrr``` works, so I can understand what I am doing... – Massimo2013 Nov 08 '20 at 14:19
  • The only think I don't like much of this solution is that it cannot be used in a pipe (this is why I was trying to use ```mutate```) – Massimo2013 Nov 08 '20 at 14:21
  • You can do `imap_dfc(dplyr::select(df, where(is.character)), get_labels, db = df_labels)` but it returns only the columns that are changed hence I used `cols`. – Ronak Shah Nov 08 '20 at 14:24
1

We could do this with deframe and map

library(purrr)
library(dplyr)
library(tibble)
lst1 <- map(df_labels, deframe)
for(nm in names(lst1))  df <- df %>%
           mutate(!! nm := lst1[[nm]][!! rlang::sym(nm)])

-output

df
# A tibble: 3 x 3
#  LOCATION      UNIT   value
#  <chr>         <chr>  <dbl>
#1 Germany       <NA>     120
#2 Italy         Euro     140
#3 United States Dollar   160
akrun
  • 874,273
  • 37
  • 540
  • 662