How can I access multiple variables with similar names in R?

Question

I am currently working with a dataframe called df consisting of about 100 variables. Among these is a group of factor variables named quite similar: A_1, A_2, A_3, A_4 ... etc. I want to do some calculations on this subset of variables - to begin with, I want to add the same new factor level to all of them.

levels(df$A_1) <- c(levels(df$A_1), "x")
levels(df$A_2) <- c(levels(df$A_2), "x")
levels(df$A_3) <- c(levels(df$A_3), "x")
levels(df$A_4) <- c(levels(df$A_4), "x")
levels(df$A_5) <- c(levels(df$A_5), "x")
levels(df$A_6) <- c(levels(df$A_6), "x")
...

This code works pretty well. However, I was wondering wether there isn't a method to access all of these variables at the same time as they all share the same prefix.

It sounds like a case for putting your data in a 'tidy' structure. I.e., instead of having an `id` + `A_1`-`A_100` you would reshape to 3 columns, `id`,`time` (1-100), and `A`. Here's a bit of discussion on how to reshape - https://stackoverflow.com/questions/2185252/reshaping-data-frame-from-wide-to-long-format - and an informative paper - https://www.jstatsoft.org/index.php/jss/article/view/v059i10/v59i10.pdf . This then makes doing something like adjusting the factor levels a simple one liner on the `A` variable, rather than looping over all the `A_n` variables. — thelatemail, Jun 12 '20 at 09:00

score 2 · Accepted Answer · answered Jun 12 '20 at 08:47

You can use lapply on selected columns and update their levels.

cols <- grep('A_\\d+', names(df))
df[cols] <- lapply(df[cols], function(x) {levels(x) <- c(levels(x), 'x');x})

str(df)
#'data.frame':  2 obs. of  3 variables:
# $ A  : int  1 2
# $ A_1: Factor w/ 3 levels "a","b","x": 1 2
# $ A_2: Factor w/ 3 levels "d","e","x": 1 2

In dplyr we can use :

library(dplyr)

df %>%
  mutate(across(starts_with('A_'), ~{levels(.) <- c(levels(.), 'x');.}))
  #In older dplyr use `mutate_at`
  #mutate_at(vars(starts_with('A_')), ~{levels(.) <- c(levels(.), 'x');.})

data

df <- data.frame(A = 1:2, A_1 = c('a', 'b'), A_2 = c('d', 'e'), 
                 stringsAsFactors = TRUE)
str(df)
#'data.frame':  2 obs. of  3 variables:
# $ A  : int  1 2
# $ A_1: Factor w/ 2 levels "a","b": 1 2
# $ A_2: Factor w/ 2 levels "d","e": 1 2

score 0 · Answer 2 · answered Jun 12 '20 at 19:44

We can use fct_expand from forcats

library(forcats)
library(dplyr)
df1 <-  df %>% 
          mutate(across(starts_with('A_'), fct_expand, 'x'))

str(df1)
#'data.frame':  2 obs. of  3 variables:
# $ A  : int  1 2
# $ A_1: Factor w/ 3 levels "a","b","x": 1 2
# $ A_2: Factor w/ 3 levels "d","e","x": 1 2

data

df <- data.frame(A = 1:2, A_1 = c('a', 'b'), A_2 = c('d', 'e'), 
                 stringsAsFactors = TRUE)

How can I access multiple variables with similar names in R?

2 Answers2

data