2

I am forever working with collaborators in SPSS and STata so clear variable labels are really important to communiate what has been done to any given variable and what it records.

How do I rename variables with their variable labels most efficiently in a tidyverse context. I can do this, but it seems very unwieldy.

var1<-rnorm(100)
var2<-rnorm(100)
var3<-rnorm(100)
group_var<-sample(c("A", "B"), size=100, replace=T)
other_var1<-rnorm(100)
other_var2<-rnorm(100)
df<-data.frame(var1, var2, var3, group_var, other_var1, other_var2)
library(labelled)
library(tidyverse)
df %>% 
  set_variable_labels(var1="Measure 1", 
                      var2="Measure 2",
                      var3="Measure 3",
                        group_var="Grouping Variable")->df


#Store variable labels
df %>% 
  select(starts_with("var")) %>% 
  var_label() %>% 
  unlist()->variable_labels
variable_labels<-data.frame(name=names(variable_labels), labels=variable_labels)
df %>% 
  pivot_longer(var1:var3) %>% 
  left_join(., variable_labels, by="name")
  

Is there a way to make the rename_with function work here? This does not do it.

df %>% 
  rename_with(., function(x) var_label(x),.cols=var1:var3)
user438383
  • 5,716
  • 8
  • 28
  • 43
spindoctor
  • 1,719
  • 1
  • 18
  • 42
  • I am confused about the last part. What are you trying to do with the `rename_with()` - rename the variables in `df` to their corresponding label, so that `var1` becomes `Measure 1`? – SamR Jul 13 '22 at 16:48
  • Note that you can get the `variable_labels` data set with `labelled::look_for()`. – harre Jul 13 '22 at 17:40

2 Answers2

6

We could use !!! with rename on a named list or vector created from variable_labels dataset

library(dplyr)
library(tibble)
df <- df %>% 
   rename(!!! deframe(variable_labels[2:1]))

-Check the names

> names(df)
[1] "Measure 1"  "Measure 2"  "Measure 3"  "group_var"  "other_var1" "other_var2"

Or if we want to use rename_with

df <- df %>%
  rename_with(~ variable_labels$labels, 
      .cols = variable_labels$name)

The reason var_label is not working is because it is looking for the value of the columns and not the column names i.e. according to ?var_label

x - a vector or a data.frame

var_label("var1")
NULL

whereas

> var_label(df$var1)
[1] "Measure 1"

If we dig the function rename_with.data.frame it would be more evident

getAnywhere('rename_with.data.frame')
function (.data, .fn, .cols = everything(), ...) 
{
    .fn <- as_function(.fn)
    cols <- tidyselect::eval_select(enquo(.cols), .data)
    names <- names(.data)
    names[cols] <- .fn(names[cols], ...)
    names <- vec_as_names(names, repair = "check_unique")
    set_names(.data, names)
}

i.e. the .fn or the lambda function is applied on the column names. Thus, when we use var_label, it require data.frame or vector and it fails

-added print statements in a modified function

rename_with_mod <- function (.data, .fn, .cols = everything(), ...) 
{
   
    cols <- tidyselect::eval_select(enquo(.cols), .data)
    print("cols")
    print(cols)
    names <- names(.data)
    print("names")
    print(names)
    .fn <- rlang::as_function(.fn)
    print(names[cols])
    .fn(names[cols], ...)
    
}

-Testing

 # lambda function to return the column name
 > df %>% 
  + rename_with_mod(~ .x, .cols=var1:var3)
[1] "cols"
var1 var2 var3 
   1    2    3 
[1] "names"
[1] "var1"       "var2"       "var3"       "group_var"  "other_var1" "other_var2"
[1] "var1" "var2" "var3"
[1] "var1" "var2" "var3"
# lambda function where we apply the var_label - returns NULL
> df %>% 
+   rename_with_mod(~ var_label(.x), .cols=var1:var3)
[1] "cols"
var1 var2 var3 
   1    2    3 
[1] "names"
[1] "var1"       "var2"       "var3"       "group_var"  "other_var1" "other_var2"
[1] "var1" "var2" "var3"
NULL
M--
  • 25,431
  • 8
  • 61
  • 93
akrun
  • 874,273
  • 37
  • 540
  • 662
1

You could also use the attributes directly:

colnames(data) <- sapply(data, function(x) attr(x, "label"))

Or if you prefer var_label and rename_with (beware though that there is no datamasking available here, thus data, not .data):

data |> 
  rename_with(function(x) sapply(x, function(y) var_label(data[[y]])))

Example with labelled haven iris data:

library(haven)

> path <- system.file("examples", "iris.dta", package = "haven")
> data <- read_dta(path)
> data
# A tibble: 150 × 5
   sepallength sepalwidth petallength petalwidth species
         <dbl>      <dbl>       <dbl>      <dbl> <chr>  
 1        5.10       3.5         1.40      0.200 setosa 
 2        4.90       3           1.40      0.200 setosa 
 3        4.70       3.20        1.30      0.200 setosa 
 4        4.60       3.10        1.5       0.200 setosa 
 5        5          3.60        1.40      0.200 setosa 
 6        5.40       3.90        1.70      0.400 setosa 
 7        4.60       3.40        1.40      0.300 setosa 
 8        5          3.40        1.5       0.200 setosa 
 9        4.40       2.90        1.40      0.200 setosa 
10        4.90       3.10        1.5       0.100 setosa 
# … with 140 more rows
> colnames(data) <- sapply(data, function(x) attr(x, "label"))
> data
# A tibble: 150 × 5
   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
          <dbl>       <dbl>        <dbl>       <dbl> <chr>  
 1         5.10        3.5          1.40       0.200 setosa 
 2         4.90        3            1.40       0.200 setosa 
 3         4.70        3.20         1.30       0.200 setosa 
 4         4.60        3.10         1.5        0.200 setosa 
 5         5           3.60         1.40       0.200 setosa 
 6         5.40        3.90         1.70       0.400 setosa 
 7         4.60        3.40         1.40       0.300 setosa 
 8         5           3.40         1.5        0.200 setosa 
 9         4.40        2.90         1.40       0.200 setosa 
10         4.90        3.10         1.5        0.100 setosa 
# … with 140 more rows

Consider using janitor::make_clean_names afterwards to make life easier for yourself.

harre
  • 7,081
  • 2
  • 16
  • 28
  • How so harre? Part of the problem is that I'm doing the data cleaning for collaborators who work only in SPSS and Stata. I'm not inclined to change original variable names. But if you have suggestions for improved workflow, I'm all ears. – spindoctor Jul 13 '22 at 17:45
  • Variable labels can contain e.g. spaces and other characters that are horrible to work with in R if you use them directly as column names. – harre Jul 13 '22 at 17:50
  • In that use case it would probably be better keeping the variable names and labels separate and save into `.dta` or `.sav` preserving both. – harre Jul 13 '22 at 17:56