2

I am looking to create a function that aggregates sale data by many different variables. I am running into a snag with aggregate(by =). Here is my function thus far:

func <- function(x, x2, statfunc) {

  PT <- c(1,5,3,5,4,8,3,1,5,6,1,5,5,6,1,2,3,1,5,1)
  SH <- c(7,7,3,1,1,1,1,4,4,6,6,7,7,1,1,1,3,2,1,3)
  SaleRatio <- c(0.85, 0.92, 0.89, 0.88, 0.86, 1.08, 1.15, 1.03, 0.95, 1.01, 1.36, 0.96, 1.03, 0.95, 0.90, 1.01, 0.96, 0.95, 0.81, 1.29)

  study <- data.frame(PT, SH, SaleRatio)

  study <- select(study, x2, SaleRatio)

  study <- aggregate(study,
              by = list(x),
              FUN = statfunc)
print(study) 
}

When I attempt to run my formula with:

func(x = "study$PT", x2 = "PT", statfunc = median)

I get the error:

Error in aggregate.data.frame(study, by = list(x), FUN = statfunc) : 
  arguments must have same length 

I am expecting this:

  Group.1 PT SaleRatio
1       1  1     0.990
2       2  2     1.010
3       3  3     0.960
4       4  4     0.860
5       5  5     0.935
6       6  6     0.980
7       8  8     1.080

The results above are from the exact same formula, only by manually entering the arguments instead of letting the function pass them.

This user provided function will eventually be applied with many different variables and aggregate functions, and on a much larger data set.

Can someone assist?

PSJupiter2
  • 33
  • 4
  • There are some issues in the function as well as in arguments, `"study$PT"` is not evaluated as intended – akrun Nov 30 '18 at 14:56
  • 1
    Hi and welcome to SO! Thanks for providing a well formatted question that shows effort on your part. Your issue is that the arguments you're passing to the function are strings, not the objects themselves. So they are evaluated as a character vector of length 1. – qdread Nov 30 '18 at 14:56
  • Another thing that would help is not to define the data inside the function. You should define it outside, then pass the data to the function as argument(s). – qdread Nov 30 '18 at 14:57
  • You can use `eval` and `parse`. Take a look at this thread https://stackoverflow.com/questions/1743698/evaluate-expression-given-as-a-string – Marius Nov 30 '18 at 15:04
  • 1
    Thank you all, the solution provided by @akrun worked perfectly with my larger data set, and with user provided functions as the "statfunc". I appreciate your help. – PSJupiter2 Nov 30 '18 at 18:42

1 Answers1

1

We can try with tidyverse

library(dplyr)
func <- function(x, x2, statfunc) {

  PT <- c(1,5,3,5,4,8,3,1,5,6,1,5,5,6,1,2,3,1,5,1)
  SH <- c(7,7,3,1,1,1,1,4,4,6,6,7,7,1,1,1,3,2,1,3)
  SaleRatio <- c(0.85, 0.92, 0.89, 0.88, 0.86, 1.08, 1.15, 1.03, 0.95,
        1.01, 1.36, 0.96, 1.03, 0.95, 0.90, 1.01, 0.96, 0.95, 0.81, 1.29)

  study <- data.frame(PT, SH, SaleRatio)

 study %>% 
      select(x2, SaleRatio) %>%
      group_by_at(x) %>%
      summarise_all(statfunc)

}


func("PT", "PT", median)
# A tibble: 7 x 2
#     PT SaleRatio
#  <dbl>     <dbl>
#1     1     0.99 
#2     2     1.01 
#3     3     0.96 
#4     4     0.86 
#5     5     0.935
#6     6     0.98 
#7     8     1.08 
akrun
  • 874,273
  • 37
  • 540
  • 662