0

I'm wanting to create a loop that will perform the same basic actions but across a list of variables. The problem is that the group_by statement is reading verbatim my looping variable name and isn't parsing the different levels of the variable. I think the root is that the input needs to be a column name but I'm having a hard time doing that.....

This code works for if I want to look at c_2 specifically:

Data <- Actual.Data.A %>%
    filter(!is.na(c_2)) %>%
    group_by( c_2 , Year , a_2) %>%
    summarise(N = n())

which will give something like this:

c_2            Year       a_2       N
0 times        2013        Male     254
1 time         2013        Male     153
0 times        2013        Female   300
1 time         2013        Female   120
 ...           ...          ...     ...

When I put the similar code in a for loop though, it doesn't give the different levels of c_2.

For example, here's the loop I have:

question.list <- as.list(c("c_2",
                     "b_2"))

for (question in question.list) {

  Data <- Actual.Data.A %>%
    filter(!is.na(question)) %>%
    group_by( question , Year , a_2) %>%
    summarise(N = n())

}

which will give me an error saying:

Error: unknown column 'question'

so I tried using paste() like so:

question.list <- as.list(c("c_2",
                     "b_2"))


for (question in question.list) {


  Data <- Actual.Data.A %>%
    filter(!is.na(question)) %>%
    group_by( paste(question) , Year , a_2) %>%
    summarise(N = n())

}

and it would give me something like this:

paste(question)       Year       a_2       N
    b_2               2014        Male     (a value)
    b_2               2014       Female    (a value)
    b_2                ...        ...        ...
    b_2                ...        ...        ...

which is obviously not what I was going for :)

I've tried all sorts of combinations and I'm at the limit of what I (very little anyway) understand with loops. I've tried paste(), call(), get(), quote(), print(), and I can't figure out how to paste the question variable as a column name for the group_by statement. I've tried some combinations too but I simply can't figure it out.

Pete
  • 168
  • 1
  • 9
  • Offhand I would guess that `as.name` might be used but without a dataset (which is your responsibility to provide) I'm not able to test that theory. – IRTFM Oct 14 '16 at 18:21
  • Looks like you need standard evaluation such as `group_by_` and `filter_`. See examples [here](http://stackoverflow.com/a/26667781/2461552) and [here](http://stackoverflow.com/questions/21390141/specify-dplyr-column-names?noredirect=1&lq=1) for `group_by_`. – aosmith Oct 14 '16 at 18:25
  • There are over 100 items coming back from a search on `[r] dplyr as.name` – IRTFM Oct 14 '16 at 18:28
  • @Pete: If the two strategies offered so far are not satisfactory, you should add a data example and click the `[reopen]` button. – IRTFM Oct 14 '16 at 18:31
  • [Here](http://stackoverflow.com/a/31760239/2461552) is one example of using `filter_` along with `lazyeval::interp` for standard evaluation filtering. – aosmith Oct 14 '16 at 18:33
  • Thanks @aosmith . I needed to use the group_by_ statement instead of group_by. From there, I had to slightly change the values in the group_by_() statement and it worked. Thank you. I would add the solution but the thread was locked. – Pete Oct 14 '16 at 19:06
  • @42- I have a solution I'd like to paste (with sample data), but I can't find the reopen button. Can you reopen it? – Pete Oct 14 '16 at 19:11

0 Answers0