0

Thank you all for looking into this! Really appreciate. With a reproduceable code, would be:

d <- data.frame(school = rep(LETTERS[1:4], times = c(25, 25, 25, 25)),
                student = rep(1:25, times=c(4)), 
                  week = seq(from = 1, to = 25, by= 5),
                  crit1fl = sample(0:1, n, replace = TRUE),
                  crit2fl = sample(0:1, n, replace = TRUE),
                  crit3fl = sample(0:1, n, replace = TRUE))


d2 <- d %>% 
  count(school, week, crit1fl) %>% 
  group_by(school, week) %>% 
  mutate(prop = n/sum(n)) %>% 
  arrange(school, week)


d2 %>% 
  ggplot(aes(x=as.factor(week), y=n, fill=crit1fl)) +
  geom_bar(position='fill', stat= 'identity') + 
  facet_wrap(~school) + 
  geom_text(
    aes(label=scales::percent(prop)),
    position='fill', vjust=1.5, color='white',
    data = d2[d2$crit1fl =='Y', ]) +
  labs(title= 'Proportion of subjects meeting the citeria', 
       y='Proportion', x = 'Week')
  

And then I want to loop through crit1 through 3fl (this is a simple example, but there are many fl variables in my data) to output graph.

using {{ varname }} seems to work but not sure why only the last graph (graph using crit3fl) is output, not all critfl variables in the iteration. And {{}} doesn't work in geom_text.

It wouldn't work in:

critvars <- paste0("crit", 1:3, "fl")
for (varname in critvars) {
  d2 <- d %>% 
    count(school, week,  {{ varname }}) %>% 
    group_by(school, week) %>% 
    mutate(prop = n/sum(n)) %>% 
    arrange(school, week)
  
  
g <- d2 %>% 
    ggplot(aes(x=as.factor(week), y=n, fill= {{ varname }})) +
    geom_bar(position='fill', stat= 'identity') + 
    facet_wrap(~school) + 
    # geom_text(
    #   aes(label=scales::percent(prop)),
    #   position='fill', vjust=1.5, color='white',
    #   data = d2[d2$crit1fl =='Y', ]) +
    labs(title= 'Proportion of subjects meeting the citeria', 
         y='Proportion', x = 'Week')

g

}

Original question: I have variable names: crit1fl, crit2fl, ... crit10fl Would like to loop through the variable names to do an analysis.

for (i in 1:10) {

  d %>% 
    count(school, week, critifl)
  .
  .  
}

How do we call variable using for loop indices?

glor
  • 109
  • 1
  • 7
  • I have decided to mark this as a duplicate of the question I linked above: https://stackoverflow.com/a/26003971/2954547 – shadowtalker Feb 28 '23 at 00:18
  • @shadowtalker I dont think this is a duplicate to the link you provided, this is a unique question. Although it should probably be closed for needing more details/clarity – jpsmith Feb 28 '23 at 00:28

2 Answers2

1

While it's possible (and not particularly difficult) to construct and use variable names programmatically in R, it's usually not a good idea, because it can lead to fragile and hard-to-maintain code. Yes, this matters, even in research scripts. There's nothing more frustrating than trying to use someone else's research code and not being able to make sense of it.

Your example, with Dplyr

Assuming you are using Dplyr, that library already gives us tools to work programmatically with variable names in data frames, without literally constructing R code objects ("names", a.k.a. "symbols"). This tool is the "embrace operator" {{ and the "injection operator" !!.

Its usage here is straightforward:

# Construct your desired variable names
critvars <- paste0("crit0", 1:10, "fl")

# Loop over each variable, performing operations as desired
for (varname in critvars) {
    mydata %>% count(school, week, {{ varname }}) %>% ...
}

These operators are described in more detail in the Rlang docs here, and are also mentioned in the Dplyr docs here, but they are not prominently displayed, the terminology is extremely obtuse and confusing (including and especially to seasoned Lisp users!), and the use of !! is not even mentioned.

I think this broadly reflects a sort of attitude problem around the "Tidyverse" ecosystem, and it's why I recommend against relying on it too heavily as a beginner, if you want to actually get good at using R.

Note that this {{ operator only works inside certain specially-designed fuctions. In base R functions and most other R functions that are not part of the Tidyverse, {{ and !! will be treated differently.

See https://stackoverflow.com/a/26003971/2954547 for examples of using these operators to construct new variable names in a dataframe (using mutate).

Other usage, without Dplyr

When using non-Dplyr functions, this is a little easier, using the standard [[ selection operator for data frames:

for (varname in critvars) {
    y <- mydata[[varname]]
    do_something(y)
}

See here for more examples and discussion.

shadowtalker
  • 12,529
  • 3
  • 53
  • 96
  • I dont believe `{{}}` works in the loop as you described. Have you tested it? Also, it's not clear by the OP if the loop indices should be the name of the column or take on a sequential numeric value in a more complicated loop. For instance, one may plot using a loop and have different values for x axes and y axes, so could store them in separate vectors and access them with `xvals[i]` and `yvals[i]`. – jpsmith Feb 28 '23 at 00:23
  • Although we can use the embrace operator `{{` inside `for` loops, the usage as described above won't work, since you are looping over strings `critvars`, while the embrace operator takes bare object names as input. To make it work we would need to loop over `lapply(critvars, as.name)`. When we have string column names and only want to access one column, then in 'dplyr' we use the `.data` pronoun as `.data[[string_col_name]]`. – TimTeaFan Feb 28 '23 at 00:26
  • @TimTeaFan I did try, but not in a loop. I'm a little surprised that makes a difference. – shadowtalker Feb 28 '23 at 01:16
  • Thank you for this answer! I updated my question. I think {{}} works, but not sure why it only works for the last loop, and doesn't work in this type of subsetting: d2[d2$ {{ varname }} == 'Y', ] – glor Feb 28 '23 at 19:38
  • @glor `[` is not a special Dplyr function, nor would I expect it to work with `$` in general. – shadowtalker Mar 01 '23 at 00:07
1

One option is to use get. You could do this several ways. It’s not clear if you want to loop over the column names themselves with your loop index, or use the loop index as a sequential numeric value - I’ll assume the latter since that’s how your index is currently assigned. The way I would do this is to define a vector with the column names of interest, nnames, then use get(names[i]) in your loop.

Below I modified your loop slightly to print out the first three observations to see what its doing (since you dont assign d in your loop):

#create a vector of names
nnames <- names(d)[grep("crit", names(d))]

# run loop 
for(i in seq(nnames)){
  print(head(
    d %>% 
      count(school, 
            week, 
            crit = get(nnames[i])
  ), 3))
}

#   school week crit n
# 1      A    2    0 3
# 2      B    1    0 1
# 3      B    2    0 2
#   school week crit n
# 1      A    2    0 1
# 2      A    2    1 2
# 3      B    1    1 1
#   school week crit n
# 1      A    2    1 1
# 2      A    2    2 2
# 3      B    1    2 1

The advantage of this way is that nnames does not have to follow a specific pattern (although here it does using grep for convenience, but could be names <- c("apples", "oranges", "potatoes")). In the rarer circumstance all of your columns do have the exact same pattern except for the value of i, you could simply with the following:

for(i in 1:3){
  print(head(
    d %>% 
      count(school, 
            week, 
            crit = get(paste0("crit",i,"fl")))
    ))
}

Which gives the same output.

The data I used are:

set.seed(123)
n <- 100
d <- data.frame(school = sample(LETTERS, n, replace = TRUE),
                week = sample(1:2, n, replace = TRUE),
                crit1fl = sample(0, n, replace = TRUE),
                crit2fl = sample(0:2, n, replace = TRUE),
                crit3fl = sample(0:2, n, replace = TRUE),
                ignore = sample(LETTERS, n, replace = TRUE))
jpsmith
  • 11,023
  • 5
  • 15
  • 36