5

I like to write a function using ddply that outputs the summary statistics based on the name of two columns of data.frame mat.

  • mat is a big data.frame with the name of columns "metric", "length", "species", "tree", ...,"index"

  • index is factor with 2 levels "Short", "Long"

  • "metric", "length", "species", "tree" and others are all continuous variables

Function:

summary1 <- function(arg1,arg2) {
    ...

    ss <- ddply(mat, .(index), function(X) data.frame(
        arg1 = as.list(summary(X$arg1)),
        arg2 = as.list(summary(X$arg2)),
        .parallel = FALSE)

    ss
}

I expect the output to look like this after calling summary1("metric","length")

Short metric.Min. metric.1st.Qu. metric.Median metric.Mean metric.3rd.Qu. metric.Max. length.Min. length.1st.Qu. length
.Median length.Mean length.3rd.Qu. length.Max. 

....

Long metric.Min. metric.1st.Qu. metric.Median metric.Mean metric.3rd.Qu. metric.Max. length.Min. length.1st.Qu. length
.Median length.Mean length.3rd.Qu. length.Max.

....

At the moment the function does not produce the desired output? What modification should be made here?

Thanks for your help.


Here is a toy example

mat <- data.frame(
    metric = rpois(10,10), length = rpois(10,10), species = rpois(10,10),
    tree = rpois(10,10), index = c(rep("Short",5),rep("Long",5))
)
Marek
  • 49,472
  • 15
  • 99
  • 121
Tony
  • 2,889
  • 8
  • 41
  • 45
  • This would be easier to answer if you provided sample data (prefereably with `dput`). – Richie Cotton Apr 19 '11 at 10:11
  • @Richie- Here is a toy example `mat<-data.frame(metric=rpois(10,10),length=rpois(10,10),species=rpois(10,10),tree=rpois(10,10),index=c(rep("Short",5),rep("Long",5)))`- Thanks – Tony Apr 19 '11 at 10:23
  • 1
    You could edit question to add sample data instead of writing a comment (I done it for you ;)). – Marek Apr 19 '11 at 11:19
  • I would suggest making your function more generalizable by passing in additional parameters for the `data.frame` and the variable to split by as well. That way your function will work when you need to use it on a data.frame named `Mat` or `MAT` or `MyOtherData`, etc. – Chase Apr 19 '11 at 13:58
  • 1
    There should be an R generic function for that. Even supporting any number of arguments. Is there such? – userJT Nov 28 '12 at 14:39

2 Answers2

4

As Nick wrote in his answer you can't use $ to reference variable passed as character name. When you wrote X$arg1 then R search for column named "arg1" in data.frame X. You can reference to it either by X[,arg1] or X[[arg1]].

And if you want nicely named output I propose below solution:

summary1 <- function(arg1, arg2) {

    ss <- ddply(mat, .(index), function(X) data.frame(
        setNames(
            list(as.list(summary(X[[arg1]])), as.list(summary(X[[arg2]]))),
            c(arg1,arg2)
            )), .parallel = FALSE)

    ss
}
summary1("metric","length")

Output for toy data is:

  index metric.Min. metric.1st.Qu. metric.Median metric.Mean metric.3rd.Qu.
1  Long           5              7            10         8.6             10
2 Short           7              7             9         8.8             10
  metric.Max. length.Min. length.1st.Qu. length.Median length.Mean length.3rd.Qu.
1          11           9             10            11        10.8             12
2          11           4              9             9         9.0             11
  length.Max.
1          12
2          12
Community
  • 1
  • 1
Marek
  • 49,472
  • 15
  • 99
  • 121
1

Is this more like what you want?

summary1 <- function(arg1,arg2) {
ss <- ddply(mat, .(index), function(X){ data.frame(
    arg1 = as.list(summary(X[,arg1])),
    arg2 = as.list(summary(X[,arg2])),
    .parallel = FALSE)})
ss
}
Gavin Simpson
  • 170,508
  • 25
  • 396
  • 453
Nick Sabbe
  • 11,684
  • 1
  • 43
  • 57