1

I have a data.frame called dat.

    colnames(dat)
    [1] "variable"  "weight" 

When I run aggregate(weight ~ variable, dat, sum) the function runs without error and returns the values I would expect.

However, when I embed aggregate() within a custom function as follows:

    bins <- function(df, var, wt, n) {
                tmp <- aggregate(wt ~ var, df, sum)

                ####################
                other code not shown
                ####################

                return(tmp)
            }

And then run out <- bins(df=dat, var=variable, wt=weight, n=5), I get the following error message:

    Error in eval(expr, envir, enclos) : object 'weight' not found

I tried using with() as well without success.

drumminactuary
  • 109
  • 2
  • 11
  • See http://stackoverflow.com/questions/34888027/how-to-pass-strings-as-arguments-in-aggregate-function-for-the-subset-paramete, should solve the problem. – m-dz Jan 18 '17 at 16:14

2 Answers2

2

Might not be the exact thing you are looking for, but I find it much easier to work with strings wherever possible:

dat <- data.frame(
  variable = sample(letters[1:5], 100, replace = TRUE),
  variable2 = sample(letters[1:5], 100, replace = TRUE),
  weight = rnorm(100)
)

bins <- function(df, var, wt, n) {
  tmp <- aggregate(
    as.formula(
      paste(
        wt,
        paste(var, collapse = '+'),
        sep = '~')),
    df, sum)
  return(tmp)
}

bins(df = dat, var = 'variable', wt = 'weight', n = 5)

bins(df = dat, var = c('variable', 'variable2'), wt = 'weight', n = 5)

Results:

  variable    weight
1        a  3.962502
2        b -0.137942
3        c -2.435460
4        d  1.557121
5        e -0.471481

   variable variable2      weight
1         a         a  0.15849141
2         b         a  2.31792997
3         c         a -2.67871600
4         d         a  1.29191822
5         e         a  0.93714161
6         a         b  0.58574200
7         b         b  1.78097554
8         c         b  0.41522095
9         d         b  0.32981119
10        e         b -0.95515100
11        a         c  1.66244525
12        b         c -1.92009677
13        c         c -2.53845106
14        d         c -1.03501447
15        e         c -0.53367121
16        a         d  0.27701130
17        b         d -0.54682389
18        c         d  3.28828483
19        d         d  1.58885843
20        e         d  0.02646149
21        a         e  1.27881159
22        b         e -1.76992683
23        c         e -0.92179907
24        d         e -0.61845273
25        e         e  0.05373811
m-dz
  • 2,342
  • 17
  • 29
  • This took care of my problem, though I still don't understand why what I was originally doing didn't work. – drumminactuary Jan 18 '17 at 16:38
  • (I can mess this up): The issue is related to the scope of a function, i.e. `aggregate()` knows the `weight ~ variable` part is a formula and called within the Global env. is looking for specified columns in the `dat` object, but called within a function is looking for `wt` and `var`. Add `str(wt ~ var)` as a first line of `bins()`, you will see why the error occured. – m-dz Jan 18 '17 at 17:13
  • This might help: [Environments in Advanced R by Hadley Wickham](http://adv-r.had.co.nz/Environments.html) – m-dz Jan 18 '17 at 17:16
-2

You can replace the simple column name by df[,column] and pass the column name as a string:

bins <- function(df, var, wt, n) {
            tmp <- aggregate(df[,wt] ~ df[,var], df, sum)

            ####################
            other code not shown
            ####################

            return(tmp)
        }

An example using the cars dataset:

bins <- function(df, var, wt, n) {

  tmp <- aggregate(df[,wt] ~ df[,var], df, sum)


  return(tmp)
}

bins(cars, 'speed', 'dist')

This post might also help you.

Community
  • 1
  • 1
Paulo MiraMor
  • 1,582
  • 12
  • 30
  • 1
    I want to reference the arguments presented to the function. When I try `tmp <- aggregate(df$wt ~ df$var, data=df, FUN=sum)` I get the following error. Error in model.frame.default(formula = df$wt ~ df$var, data = df) : invalid type (NULL) for variable 'df$wt' – drumminactuary Jan 18 '17 at 16:13
  • @drumminactuary OK. I edited the post to follow your requests. – Paulo MiraMor Jan 18 '17 at 16:36