0

I'm trying to use dynamic variables in a for loop to access a tablename. Other questions on SO (eg here, here and here) seem to be about using dynamic variables to access a column name. I'm using R v4.0.3 and dplyr v1.0.2.

Basically, I'm importing from a .sav (SPSS) file, and seeking to split the 400+ columns into smaller dataframes holding the info for each survey question. That part works, but I then want to do things like add a mean column for each new dataframe. I'm currently trying to do that in the segmentation for loop, but I can't get it to work. (I'd also be happy to do it separately for each new dataframe in another for loop or a list or something, but I can't see how that would work either if I can't get the other one to work!)

Simplifying somewhat, the columns in the original file are named as QX.Y_Z, where Z are items within question Y in the X block.

Some dummy data, setting up a (sav-type) dataframe of 2 questions, each with two items:

    mydata=tibble(Q6.1_1_1=as.numeric(c(2, 1, 3, 1, 2, 3, 1, 3, 2, 1, 1, 1, 2, 2)),
              Q6.1_1_2=as.numeric(c(1, 3, 1, 1, 1, 2, 3, 3, 1, 3, 1, 1, 1, 2)),
              Q7.1_1_1=as.numeric(c(1, 2, 1, 2, 1, 3, 3, 1, 2, 3, 2, 1, 3, 2)),
              Q7.1_1_2=as.numeric(c(3, 1, 3, 1, 2, 1, 3, 2, 3, 1, 3, 1, 1, 3)),
              )
    var_label(mydata$Q6.1_1_1)<-"Rate your effort - before."
    var_label(mydata$Q6.1_1_2)<-"Rate your effort - before." 
    var_label(mydata$Q7.1_1_1)<-"Rate your enthusiasm - before." 
    var_label(mydata$Q7.1_1_2)<-"Rate your enthusiasm - after." 
    val_labels(mydata$Q6.1_1_1)<-c(Low=1, Medium=2, High=3)
    val_labels(mydata$Q6.1_1_2)<-c(Low=1, Medium=2, High=3)
    val_labels(mydata$Q7.1_1_1)<-c(Low=1, Medium=2, High=3)
    val_labels(mydata$Q7.1_1_2)<-c(Low=1, Medium=2, High=3)
mydata
# A tibble: 14 x 4
     Q6.1_1_1   Q6.1_1_2   Q7.1_1_1   Q7.1_1_2
    <dbl+lbl>  <dbl+lbl>  <dbl+lbl>  <dbl+lbl>
 1 2 [Medium] 1 [Low]    1 [Low]    3 [High]  
 2 1 [Low]    3 [High]   2 [Medium] 1 [Low]   
 3 3 [High]   1 [Low]    1 [Low]    3 [High]  
 4 1 [Low]    1 [Low]    2 [Medium] 1 [Low]   
 5 2 [Medium] 1 [Low]    1 [Low]    2 [Medium]
 6 3 [High]   2 [Medium] 3 [High]   1 [Low]   
 7 1 [Low]    3 [High]   3 [High]   3 [High]  
 8 3 [High]   3 [High]   1 [Low]    2 [Medium]
 9 2 [Medium] 1 [Low]    2 [Medium] 3 [High]  
10 1 [Low]    3 [High]   3 [High]   1 [Low]   
11 1 [Low]    1 [Low]    2 [Medium] 3 [High]  
12 1 [Low]    1 [Low]    1 [Low]    1 [Low]   
13 2 [Medium] 1 [Low]    3 [High]   1 [Low]   
14 2 [Medium] 2 [Medium] 2 [Medium] 3 [High] 

Remove the item number from the question string:

varlist<-mydata %>% 
  colnames() %>% 
  as_tibble() %>% 
  separate(value, "qno", sep="_", extra = "drop", fill="right") %>%
  unique() %>% 
  pull()
> varlist
[1] "Q6.1" "Q7.1"

Generate the subtables:

for (v in varlist) {
  assign(paste0("table", v), select(mydata, matches(v)))
} 

This gives me subtables called tableQ6.1 and tableQ7.1. So far, so good.

However, when I try to add a mean column (giving the mean of each row) for each of these subtables as they are generated, I can't find a way to tell mutate() to use the dynamic name of the table. These are a couple of the options I've tried, but all I get (with these and a LOT more) is errors, so I must be missing something obvious:

for (v in varlist) {
  assign(paste0("table", v), select(mydata, matches(v)))
  tabname<-sym(paste0("table", v))
  mutate({{tabname}}, mean=rowMeans(across(where(is.numeric)), na.rm = FALSE))
}

for (v in varlist) {
  assign(paste0("table", v), select(mydata, matches(v)))
  tabname<-"table{v}" %>%
    mutate("mean{v}":=rowMeans(across(where(is.numeric)), na.rm = FALSE))
}

Any guidance (including wider comments on whether this is the best approach) would be welcome!

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
donnek
  • 221
  • 1
  • 9

1 Answers1

0

You could use split.default to split dataframe with similar column names and then take row wise mean of each subset. This would avoid you to create intermediate dataframes and variables.

sapply(split.default(mydata, sub('\\..*', '', names(mydata))), rowMeans)

#       Q6  Q7
# [1,] 1.5 2.0
# [2,] 2.0 1.5
# [3,] 2.0 2.0
# [4,] 1.0 1.5
# [5,] 1.5 1.5
# [6,] 2.5 2.0
# [7,] 2.0 3.0
# [8,] 3.0 1.5
# [9,] 1.5 2.5
#[10,] 2.0 2.0
#[11,] 1.0 2.5
#[12,] 1.0 1.0
#[13,] 1.5 2.0
#[14,] 2.0 2.5
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • Thanks. But I think I need to create the subtables to allow for analysis at the question level, plotting of data, and so on. – donnek Jan 09 '21 at 17:39
  • @donnek `subtables <- split.default(mydata, sub('\\..*', '', names(mydata)))` This is where your subtables are. It is better and easier to manage if you keep them in a list instead of creating objects in global environment. – Ronak Shah Jan 09 '21 at 23:52
  • Aaah, I see what you mean. I get a list of lists, the latter being the dataframes for each question. I can access those ```subtables$Q6```, ```subtables[["Q6"]]``` or ```subtables %>% pluck("Q6")```, but is there any way to pass a list of questions (eg ```c("Q6", "Q7", "Q12")``` to the subtables list and get the columns for those questions? I can't seem to get the right invocation for that - obviously a lot of learning to do on lists! – donnek Jan 11 '21 at 19:56
  • You can do `subtables[c('Q6', 'Q7', 'Q12')]` to get multiple lists. – Ronak Shah Jan 11 '21 at 23:49
  • Yes, but that just gives a subset of ```subtables``` - I was trying to get the columns in that subset in one dataframe. – donnek Jan 12 '21 at 08:59
  • You can do `lapply(subtables[c('Q6', 'Q7', 'Q12')], colnames)` Or `sapply(subtables[c('Q6', 'Q7', 'Q12')], colnames)`. Are you sure all these questions are part of your original post? – Ronak Shah Jan 12 '21 at 09:01
  • Haha! That **was** my original question: "However, when I try to add a mean column (giving the mean of each row) for each of these subtables as they are generated, I can't find a way to tell mutate() to use the dynamic name of the table." Thanks for your help. – donnek Jan 12 '21 at 14:56