1

I have got in to strange problem where a function of aggregate is acting weird if I call it in custom function. It seems to totally over rule the subset function:

To give you gist of what the problem is, I will break it in two parts. 1. without custom function

    c<- data.frame(A = c("carr","bike","truck","carr","truck","bike","bike","carr","truck","carr","truck","truck","carr","truck","truck"),
                B = c(10,20,30,23,45,56,78,44,10,20,30,10,20,30,67),
                D = c(1,2,3,1,2,3,2,3,2,3,2,2,3,2,1))

c_subset<- subset(c,(A=="carr")|(A=="bike"))

dg<- aggregate(B ~ D + A  ,c_subset,max)

the value of dg is:

D   A   B           
2   bike    78
3   bike    56
1   carr    23
3   carr    44

Which is exactly how it should be.

But 2. With custom function:

 rtk <- function(datam,inc_coll,inc_vall,lb,ld){
  datam_subset <- subset(c,inc_coll %in% inc_vall)
  dg1<- aggregate(lb ~ ld + inc_coll,datam_subset,max)

  return(dg1)
}

c_ans <- rtk(c,c$A,c("carr","bike"),c$B,c$D)

The answer is:

ld  inc_coll    lb

2   bike    78
3   bike    56
1   carr    23
3   carr    44
1   truck   67
2   truck   45
3   truck   30

Now I want to know why it is getting "truck" in aggregate function? Although in aggregate function I have used data datam_subset that has been a subset and contains only data on "carr" and "bike".

May be I am missing something very basic. Shall be grateful for your help. Thanks

Psidom
  • 209,562
  • 33
  • 339
  • 356
Sana Ali
  • 165
  • 1
  • 11
  • 1
    You're giving as input to your function `rtk` the actual columns of your original dataframe instead of just column names. So the `subset` works but then the `aggregate` is done on the columns of `c`. – Lamia May 14 '17 at 21:46
  • 1
    as no one has really mentioned it. http://stackoverflow.com/questions/9860090/why-is-better-than-subset gives some thoughts on using subset in a function. Also aggregate has a subset argument which may make things easier – user20650 May 14 '17 at 22:42
  • *datam* param is never used inside function though you pass `c`into it – Parfait May 15 '17 at 03:49

3 Answers3

0

It is because you aggregate function is calling from two data.frames

This:

dg1<- aggregate(lb ~ ld + inc_coll, datam_subset, max)

actually reads like:

dg1<- aggregate(c$B ~ c$D + c$A, datam_subset, max)

so you are overriding the datam_subset call and simply calling c.

B Williams
  • 1,992
  • 12
  • 19
0

Actually there are 2 problems. First, you're subsetting c not datam as others have pointed out, but again this doesn't solve the problem. datam_subset has no columns named lb, ld, inc_call. So your function should look like:

rtk <- function(datam, inc_coll, inc_vall, lb, ld){
  datam_subset <- subset(datam, inc_coll %in% inc_vall)
  names(datam_subset)<- c("inc_coll", "lb", "ld")
  dg1<- aggregate(lb ~ ld + inc_coll,datam_subset,max)
  return(dg1)
}

> c_ans <- rtk(c,c$A,c("carr","bike"),c$B, c$D)
> c_ans
  ld inc_coll lb
1  2     bike 78
2  3     bike 56
3  1     carr 23
4  3     carr 44

You can change the names to c_ans just by colnames(c_ans)<- c("D", "A", "B")

Yannis Vassiliadis
  • 1,719
  • 8
  • 14
0

Passing column names to a function is a question often asked as it can be counterintuitive. Check this question:Pass a data.frame column name to a function A better way to write your function would be to pass to rtk the column names instead of the columns themselves and then use them for what you want to do:

rtk <- function(datam,inc_coll,inc_vall,lb,ld){
## Access the column using df[[colname]] to do the subset
  datam_subset <- subset(c,c[[inc_coll]] %in% inc_vall);
## Define the formula you will use in the aggregate function
f=as.formula(paste0(lb,"~",ld,"+", inc_coll))
## Perform the aggregation
  dg1<- aggregate(f,datam_subset,max);
  return(dg1)
}

Then call it appropriately using column names:

c_ans <- rtk(c,"A",c("carr","bike"),"B","D")

Which gives you:

D    A  B
1 2 bike 78
2 3 bike 56
3 1 carr 23
4 3 carr 44
Lamia
  • 3,845
  • 1
  • 12
  • 19
  • Thanks Lamia for explaining it in detail. @Yannis solution also works but yours is more configurable. – Sana Ali May 15 '17 at 04:28