Using lapply to label the values of specific variables

Question

I would like to use lapply to label the values of specific variables. I have found an example that gets me close (here), but I can't get it to work for only certain variables in the data set.

Working example:

df1 <- tribble(
 ~var1, ~var2, ~var3, ~var4,
 "1",   "1",   "1", "a",
 "2",   "2",   "2", "b",
 "3",   "3",   "3", "c"
)

Here is the code that seems like it should work, but doesn't:

df1["var1", "var2"] <- lapply(df1["var1", "var2"], factor,
                          levels=c(1, 
                                   2, 
                                   3), 
                          labels = c("Agree", 
                                     "Neither Agree/Disagree", 
                                     "Disagree"))

The code runs, but give the following output:

# A tibble: 4 x 4
  var1  var2  var3  var4
* <chr> <chr> <chr> <chr>
1     1     1     1     a
2     2     2     2     b
3     3     3     3     c
4  <NA>  <NA>  <NA>  <NA>

If I try with just one variable, it works:

df1["var1"] <- lapply(df1["var1"], factor,
                          levels=c(1, 
                                2, 
                                3), 
                          labels = c("Agree", 
                                  "Neither Agree/Disagree", 
                                  "Disagree"))

It gives the following output (which is correct):

# A tibble: 3 x 4
                    var1  var2  var3  var4
                  <fctr> <chr> <chr> <chr>
1                  Agree     1     1     a
2 Neither Agree/Disagree     2     2     b
3               Disagree     3     3     c

I have tried a lot of different ways to change the code to get it to work, but I just can't figure it out.

just use `df1[c("var1","var2")]` or `df1[1:2]` – Onyambu Dec 22 '17 at 02:43 — Onyambu, Dec 22 '17 at 02:43

A5C1D2H2I1M1N2O1R2T1 · Accepted Answer · 2017-12-22T04:44:40.297

Your problem is arising because you're trying to subset your data.frame incorrectly.

In a data.frame or tbl, extracting using [ works in a couple of ways.

Since the data is in a matrix-like rectangular form, you can use a [row, column] approach to get specific values. For example to get a single value, you can do something like df1[2, 1].
Since a tbl/data.frame is a special type of list, if you don't supply a comma, it assumes you want the entire list element.

Thus, when you did ["var1", "var2"], it went into matrix subsetting mode and was looking for a row named "var1", which it couldn't find, so it inserted a row of NA values in your dataset.

Here's a small set of examples for you to experiment with.

Get rows 1:4 and columns 1:4

df <- mtcars[1:4, 1:4]
df
#                 mpg cyl disp  hp
# Mazda RX4      21.0   6  160 110
# Mazda RX4 Wag  21.0   6  160 110
# Datsun 710     22.8   4  108  93
# Hornet 4 Drive 21.4   6  258 110

Extract a single value using a [row, column] approach
```
df["Mazda RX4", "mpg"]  # [row, column]
# [1] 21
```
Check whether a data.frame is a list
```
is.list(df)
# [1] TRUE
```

Convert a data.frame to a list and try to extract using [row, column].

L <- unclass(df)
L["Mazda RX4", "mpg"]   # A list doesn't have `dim`s.
# Error in L["Mazda RX4", "mpg"] : incorrect number of dimensions

Providing just one value to extract from a data.frame or a list

df["mpg"]               # Treats it as asking for a single value from a list
#                 mpg
# Mazda RX4      21.0
# Mazda RX4 Wag  21.0
# Datsun 710     22.8
# Hornet 4 Drive 21.4

L["mpg"]
# $mpg
# [1] 21.0 21.0 22.8 21.4

Providing a vector of values to extract

df[c("mpg", "hp")]
#                 mpg  hp
# Mazda RX4      21.0 110
# Mazda RX4 Wag  21.0 110
# Datsun 710     22.8  93
# Hornet 4 Drive 21.4 110

L[c("mpg", "hp")]
# $mpg
# [1] 21.0 21.0 22.8 21.4
# 
# $hp
# [1] 110 110  93 110

Since a data.frame is a special type of list with dims, using an empty [, vals] would work

df[, c("mpg", "hp")]
#                 mpg  hp
# Mazda RX4      21.0 110
# Mazda RX4 Wag  21.0 110
# Datsun 710     22.8  93
# Hornet 4 Drive 21.4 110

Looking for a row that is not there would return NAs

df["not here", ]
#    mpg cyl disp hp
# NA  NA  NA   NA NA

Keeping those details in mind, your best approach is to just use (as suggested in @www's answer:

df1[c("var1", "var2")]

Thanks so much @A5C1D2H2I1M1N2O1R2T1! This is the kind of answer that helps me better understand R and the logic behind the coding. Much appreciated. — scottsmith, Dec 22 '17 at 15:49

score 2 · Answer 2 · answered Dec 22 '17 at 02:39

You were close. We need df1[c("var1", "var2")] to specify columns.

df1[c("var1", "var2")] <- lapply(df1[c("var1", "var2")], factor,
                              levels=c("1", 
                                       "2", 
                                       "3"), 
                              labels = c("Agree", 
                                         "Neither Agree/Disagree", 
                                         "Disagree"))
df1
# # A tibble: 3 x 4
#                     var1                   var2  var3  var4
#                   <fctr>                 <fctr> <chr> <chr>
# 1                  Agree                  Agree     1     a
# 2 Neither Agree/Disagree Neither Agree/Disagree     2     b
# 3               Disagree               Disagree     3     c

Ah, okay. Such a simple thing. I won't make that mistake again. Thanks @www! — scottsmith, Dec 22 '17 at 15:47

Using lapply to label the values of specific variables

2 Answers2