How do I subset a column of the current group in dplyr?

Question

I am having difficulty parameterizing some code I wrote. It works fine when not in a function like this:

new_df = group_by(groupby1, groupby2) %>%
         mutate(new_value=
            slider_helper(
                slide(cur_data()[, c('string1', 'string2'], ~.x, .before = Inf, .after = -1),
                cur_data()$string2,
                'string1',
                beta

But when I try to create a function where you can pass strings for the variables to group by and the variables to slide over like so:

my_fun <- function(df, groupby1, groupby2, string1, string2, beta) {
    return(df %>%
        group_by({{groupby1}}, {{groupby2}}) %>%
        mutate(new_value=
            slider_helper(
                slide(cur_data()[, c({{string1}}, {{string2}}], ~.x, .before = Inf, .after = -1),
                cur_data()[[{{string2}}]],
                {{string1}},
                beta)))
}

I get this vague stack trace:

The error occurred in group 1: "groupby1" = "groupby1", "groupby2" = "groupby2".
Caused by error in `.subset()`:
! invalid subscript type 'closure'

What is the proper way to parameterize dplyr functions to work with column names passed as strings?

EDIT:

Here is a reproducible example

slider_helper <- function(left, right, string1, beta) {

    cbind_helper <- function(left, right) {
        todaysDate = rep(right, nrow(left))
    return(cbind(left, todaysDate))
    }

    date_helper <- function(today, date) {
        return(1/as.integer(today - date))
    }

    df = data.frame(t(mapply(cbind_helper, left, right)))
    df$val1= mapply(date_helper, df[,'todaysDate'], df[, date])
    df$val1_product= mapply('%*%', df$val1, df[[target]]) / sapply(df$val1, FUN=sum, na.rm=T)
    df$val2= 1/seq(nrow(df), 1)
    df$val2_product= sapply(mapply('*', df$val2, df[[target]]), FUN=sum, na.rm=T) / sum(df$val2, na.rm=T)
    w_sum = beta * df$val2_product+ (1-beta) * df$val2_product
    return(w_sum)
}

my_fun <- function(df, groupby1, groupby2, string1, string2, beta) {
    return(df %>%
        group_by({{groupby1}}, {{groupby2}}) %>%
        mutate(new_value=
            slider_helper(
                slide(cur_data()[, c({{string1}}, {{string2}})], ~.x, .before = Inf, .after = -1),
                cur_data()[[{{string2}}]],
                {{string1}},
                beta)))
}

df= data.frame(sample(1:2, 20, replace=T), sample(1:2, 20, replace=T), seq(from=-1, to=.9, by = .1), seq.Date(from=as.Date('2011-01-01'), to=as.Date('2011-01-20'), by = 1))
colnames(df) = c('groupby1', 'groupby2', 'string1', 'string2')
my_fun(testy, 'groupby1', 'groupby2', 'string1', 'string2', 0.5)

[See here](https://stackoverflow.com/q/5963269/5325862) on making a reproducible example that is easier for folks to help with. — camille, Jan 10 '23 at 04:15

score 0 · Answer 1 · answered Jan 10 '23 at 03:34

0

Without sample data and more context this is kind of a guess.

But, if you are using a simple function then you need to remove the {{}}. The function does not need those brackets.

my_fun <- function(df, groupby1, groupby2, string1, string2, beta) {
    return(df %>%
        group_by(groupby1, groupby2) %>%
        mutate(new_value=
            slider_helper(
                slide(cur_data()[, c(string1, string2)], ~.x, .before = Inf, .after = -1),
                cur_data()[[string2]],
                string1,
                beta)))
}

answered Jan 10 '23 at 03:34

sconfluentus

4,693
1
21
40

I am working on creating an example others can try, but with dplyr if you have a column user_id, you can refer to it directly with user_id, no quotes needed. So if a function takes a parameter user, and you pass the string 'user_id' as the argument for it, dplyr will look for a column called user, the variable name, and will not evaluate the variable and look for the string it evaluates to. – Jage Jan 10 '23 at 15:48
I have added an MRE which you can copy and paste into R to see how dplyr's grammar complicates things here. – Jage Jan 10 '23 at 16:05

How do I subset a column of the current group in dplyr?

1 Answers1