39

I am struggling with variable labels of data.frame columns. Say I have the following data frame (part of much larger data frame):

data <- data.frame(age = c(21, 30, 25, 41, 29, 33), sex = factor(c(1, 2, 1, 2, 1, 2), labels = c("Female", "Male")))
#

I also have a named vector with the variable labels for this data frame:

var.labels <- c(age = "Age in Years", sex = "Sex of the participant")

I want to assign the variable labels in var.labels to the columns in the data frame data using the function label from the Hmisc package. I can do them one by one like this and check the result afterwards:

> label(data[["age"]]) <- "Age in years"
> label(data[["sex"]]) <- "Sex of the participant"
> label(data)
                 age                      sex
      "Age in years" "Sex of the participant"

The variable labels are assigned as attributes of the columns:

> attr(data[["age"]], "label")
[1] "Age in years"
> attr(data[["sex"]], "label")
[1] "Sex of the participant"

Wonderful. However, with a larger data frame, say 100 or more columns, this will not be convenient or efficient. Another option is to assign them as attributes directly:

> attr(data, "variable.labels") <- var.labels

Does not help. The variable labels are not assigned to the columns:

> label(data)
age sex
 ""  ""

Instead, they are assigned as an attribute of the data frame itself (see the last component of the list):

> attributes(data)
$names
[1] "age" "sex"

$row.names
[1] 1 2 3 4 5 6

$class
[1] "data.frame"

$variable.labels
                 age                      sex
      "Age in Years" "Sex of the participant"

And this is not what I want. I need the variable labels as attributes of the columns. I tried to write the following function (and many others):

set.var.labels <- function(dataframe, label.vector){
  column.names <- names(dataframe)
  dataframe <- mapply(label, column.names, label.vector)
  return(dataframe)
}

And then execute it:

> set.var.labels(data, var.labels)

Did not help. It returns the values of the vector var.labels but does not assign the variable labels. If I try to assign it to a new object, it just contains the values of the variable labels as a vector.

coip
  • 1,312
  • 16
  • 30
panman
  • 1,179
  • 1
  • 13
  • 33

4 Answers4

39

You can do this by creating a list from the named vector of var.labels and assigning that to the label values. I've used match to ensure that values of var.labels are assigned to their corresponding column in data even if the order of var.labels is different from the order of the data columns.

library(Hmisc)

var.labels = c(age="Age in Years", sex="Sex of the participant")

label(data) = as.list(var.labels[match(names(data), names(var.labels))])

label(data)
                     age                      sex 
          "Age in Years" "Sex of the participant" 

Original Answer

My original answer used lapply, which isn't actually necessary. Here's the original answer for archival purposes:

You can assign the labels using lapply:

label(data) = lapply(names(data), function(x) var.labels[match(x, names(var.labels))])

lapply applies a function to each element of a list or vector. In this case the function is applied to each value of names(data) and it picks out the label value from var.labels that corresponds to the current value of names(data).

Reading through a few tutorials is a good way to get the general idea, but you'll really get the hang of it if you start using lapply in different situations and see how it behaves.

eipi10
  • 91,525
  • 24
  • 209
  • 285
  • @ eipi10: Thank you very much! It works! This is EXACTLY what what I needed. I have problems understanding the indexing when working with the `apply` family of functions. Is there any guide I could read or it is a matter of experience? – panman Dec 07 '14 at 21:39
  • For brief tutorials on `lapply`, [this](http://rollingyours.wordpress.com/category/r-programming-apply-lapply-tapply/) and [this](https://nsaunders.wordpress.com/2010/08/20/a-brief-introduction-to-apply-in-r/) might be helpful. I've also added some more explanation to my answer. – eipi10 Dec 08 '14 at 05:34
  • Thanks! What I do not understand is why after you do `lapply(names(var.labels), function(x) label(data[,x]) = var.labels[x])`, where is already the label assignation, one has to make `label(data) = ` – iago May 11 '18 at 08:14
  • You're right. It's unnecessary to do the assignment. Thanks for pointing that out. I've fixed the code and also updated the code so that it assigns the correct labels regardless of the ordering of `var.labels` and regardless of whether `names(var.labels)` includes additional elements that are not present in `names(data)`. – eipi10 May 11 '18 at 22:34
  • 1
    Actually, coming back to this after several years, I see that `lapply` wasn't even necessary. I've updated the answer accordingly. However, @avallecam's answer using the `Hmisc` function `upData` is a more convenient way of updating the labels. – eipi10 May 11 '18 at 22:49
  • @eipi10 Thanks for your reply. Anyway that we match and ignore case (meaning eithr the variable name is in capital or small letters)? I tried this ```label(data) = as.list(var.labels[match(names(data,ignore.case=T), names(var.labels,ignore.case=T))]) ``` but it did not work. Appreciate your advice. – Mohamed Rahouma Jul 24 '23 at 05:11
19

I highly recommend to use the Hmisc::upData() function.

Here a reprex example:


set.seed(22)
data <- data.frame(age = floor(rnorm(6,25,10)), 
                   sex = gl(2,1,6, labels = c("f","m")))
var.labels <- c(age = "Age in Years", 
                sex = "Sex of the participant")
dplyr::as.tbl(data) # as tibble ---------------------------------------------
#> # A tibble: 6 × 2
#>     age    sex
#>   <dbl> <fctr>
#> 1    19      f
#> 2    49      m
#> 3    35      f
#> 4    27      m
#> 5    22      f
#> 6    43      m
data <- Hmisc::upData(data, labels = var.labels) # update data --------------
#> Input object size:    1328 bytes;     2 variables     6 observations
#> New object size: 2096 bytes; 2 variables 6 observations
Hmisc::label(data) # check new labels ---------------------------------------
#>                      age                      sex 
#>           "Age in Years" "Sex of the participant"
Hmisc::contents(data) # data dictionary -------------------------------------
#> 
#> Data frame:data  6 observations and 2 variables    Maximum # NAs:0
#> 
#> 
#>                     Labels Levels   Class Storage
#> age           Age in Years        integer integer
#> sex Sex of the participant      2         integer
#> 
#> +--------+------+
#> |Variable|Levels|
#> +--------+------+
#> |   sex  |  f,m |
#> +--------+------+
avallecam
  • 669
  • 8
  • 8
  • 1
    `Hmisc::upData(data, labels = )` is awesome! Searching for this for hours. – בנימן הגלילי Aug 08 '17 at 19:01
  • If you want to export variable labels to **.dta** (stata) format, `Hmisc::upData()` shows this [issue](https://github.com/tidyverse/haven/issues/283). Instead, use `labelled::set_variable_labels()`, e.g.: `cars %>% labelled::set_variable_labels(speed = "speed in mph", dist = "stopping distance in ft") %>% haven::write_dta("cars.dta")` – avallecam Aug 20 '18 at 02:35
10

Instead of {Hmisc} you can use the package {labelled}:

data <- labelled::set_variable_labels(data, .labels = var.labels)
Vlad
  • 912
  • 9
  • 9
  • 2
    Stupid question maybe, but R-script doesn't support variable labels out-of-the-box? I mean, you always need an additional package with functions to add or change the labels? – BdR Apr 09 '21 at 12:55
  • You can use the base R function `attr()`. See the top question post. The reliance on packages is because assigning attributes is not very convenient. The problem is that the way base R does this is in a *very* general fashion, rather than in a way specifically targeted to the task of just labeling variables or values. – Vlad Apr 10 '21 at 13:14
3

If your vector of labels matches the order of your data.frame columns, but isn't a named vector (so can't be used to subset data.frame columns by name like the lapply approach in the other answer), you can use a for-loop:

for(i in seq_along(data)){
  Hmisc::label(data[, i]) <- var.labels[i]
}

label(data)
#>                      age                      sex 
#>           "Age in Years" "Sex of the participant"
Sam Firke
  • 21,571
  • 9
  • 87
  • 105