1

I just discovered the power of plyr frequency table with several variables in R and I am still struggling to understand how it works and I hope some here can help me.

I would like to create a table (data frame) in which I can combine frequencies and summary stats but without hard-coding the values.

Here an example dataset

require(datasets)

d1 <- sleep
# I classify the variable extra to calculate the frequencies 
extraClassified <- cut(d1$extra, breaks = 3, labels = c('low', 'medium', 'high') )
d1 <- data.frame(d1, extraClassified)

The results I am looking for should look like that :

  require(plyr)

  ddply(d1, "group", summarise,  
  All = length(ID), 

  nLow    = sum(extraClassified  == "low"),
  nMedium = sum(extraClassified  == "medium"),      
  nHigh =  sum(extraClassified  == "high"),

  PctLow     = round(sum(extraClassified  == "low")/ length(ID), digits = 1),
  PctMedium  = round(sum(extraClassified  == "medium")/ length(ID), digits = 1),      
  PctHigh    = round(sum(extraClassified  == "high")/ length(ID), digits = 1),

  xmean    = round(mean(extra), digits = 1),
  xsd    =   round(sd(extra), digits = 1))

My question: how can I do this without hard-coding the values?

For the records: I tried this code, but it does not work

ddply (d1, "group", 
   function(i) c(table(i$extraClassified),     
   prop.table(as.character(i$extraClassified))),
   )

Thanks in advance

Community
  • 1
  • 1
user1043144
  • 2,680
  • 5
  • 29
  • 45
  • Why not just write your own function, rather than using `summarise`? – joran Aug 09 '12 at 18:18
  • Thanks Joran. The truth is : I have no idea how this function would have to look like. I tried several ideas to use the table function to no avail. FYI : data I work with have several factors. – user1043144 Aug 09 '12 at 18:32

2 Answers2

2

Here's an example to get you started:

foo <- function(x,colfac,colval){
    tbl <- table(x[,colfac])
    res <- cbind(n = nrow(x),t(tbl),t(prop.table(tbl)))
    colnames(res)[5:7] <- paste(colnames(res)[5:7],"Pct",sep = "")
    res <- as.data.frame(res)
    res$mn <- mean(x[,colval])
    res$sd <- sd(x[,colval])
    res
}

ddply(d1,.(group),foo,colfac = "extraClassified",colval = "extra")

Don't take anything in that function foo as gospel. I just wrote that off the top of my head. Surely improvements/modifications are possible, but at least it's something to start with.

joran
  • 169,992
  • 32
  • 429
  • 468
2

Thanks to Joran. I slighlty modified your function to make it more generic (without reference to the position of the variables) .

require(plyr)
            foo <- function(x,colfac,colval)
            {

              # table with frequencies
              tbl    <- table(x[,colfac])
              # table with percentages 
              tblpct <- t(prop.table(tbl))
              colnames( tblpct) <- paste(colnames(t(tbl)), 'Pct', sep = '')

              # put the first part together 
              res <- cbind(n = nrow(x), t(tbl), tblpct)
              res <- as.data.frame(res)

              # add summary statistics 

              res$mn <- mean(x[,colval])
              res$sd <- sd(x[,colval])
              res
            }

ddply(d1,.(group),foo,colfac = "extraClassified",colval = "extra")

and it works !!!

P.S : I still do not understand what (group) stands for but

user1043144
  • 2,680
  • 5
  • 29
  • 45