8

I would like to summarize the pass/fail status for my data as below. In other words, I would like to tell the number of pass and fail cases for each product/type.

library(ggplot2)
library(plyr)
product=c("p1","p1","p1","p1","p1","p1","p1","p1","p1","p1","p1","p1","p2","p2","p2","p2","p2","p2","p2","p2","p2","p2","p2","p2")
type=c("t1","t1","t1","t1","t1","t1","t2","t2","t2","t2","t2","t2","t1","t1","t1","t1","t1","t1","t2","t2","t2","t2","t2","t2")
skew=c("s1","s1","s1","s2","s2","s2","s1","s1","s1","s2","s2","s2","s1","s1","s1","s2","s2","s2","s1","s1","s1","s2","s2","s2")
color=c("c1","c2","c3","c1","c2","c3","c1","c2","c3","c1","c2","c3","c1","c2","c3","c1","c2","c3","c1","c2","c3","c1","c2","c3")
result=c("pass","pass","fail","pass","pass","pass","fail","pass","fail","pass","fail","pass","fail","pass","fail","pass","pass","pass","pass","fail","fail","pass","pass","fail")
df = data.frame(product, type, skew, color, result)

The following cmd returns the total number of pass+fail cases but I want separate columns for pass and fail

dfSummary <- ddply(df, c("product", "type"), summarise, N=length(result))

Result is:

        product type N
 1      p1      t1   6
 2      p1      t2   6
 3      p2      t1   6
 4      p2      t2   6

The desireable result would be

         product type Pass Fail
 1       p1      t1   5    1
 2       p1      t2   3    3
 3       p2      t1   4    2
 4       p2      t2   3    3

I have attempted somthing like:

 dfSummary <- ddply(df, c("product", "type"), summarise, Pass=length(df$product[df$result=="pass"]), Fail=length(df$product[df$result=="fail"]) )

but obviously it’s wrong since the results are the grand totatl for fail and pass.

Thanks in advance for your advice ! Regards, Riad.

ialm
  • 8,510
  • 4
  • 36
  • 48
Riad
  • 953
  • 3
  • 13
  • 23

2 Answers2

12

Try:

dfSummary <- ddply(df, c("product", "type"), summarise, 
                   Pass=sum(result=="pass"), Fail=sum(result=="fail") )

Which gives me result:

  product type Pass Fail
1      p1   t1    5    1
2      p1   t2    3    3
3      p2   t1    4    2
4      p2   t2    3    3

Explanation:

  1. You are giving the data set, df to the ddply function.
  2. ddply is splitting on the variables, "product" and "type"
    • This results in length(unique(product)) * length(unique(type)) pieces (i.e. subsets of the data df) split on every combination of the two variables.
  3. With each of the pieces, ddply applies some function that you provide. In this case, you count the number of result=="pass" and result=="fail" there are.
  4. Now ddply is left with some results for each piece, namely the variables you split on (product and type) and the results you requested (Pass and Fail).
  5. It combines all of the pieces together and returns it
ialm
  • 8,510
  • 4
  • 36
  • 48
4

You could also use reshape2::dcast.

library(reshape2)
dcast(product + type~result,data=df, fun.aggregate= length,value.var = 'result')
##   product type fail pass
## 1      p1   t1    1    5
## 2      p1   t2    3    3
## 3      p2   t1    2    4
## 4      p2   t2    3    3
mnel
  • 113,303
  • 27
  • 265
  • 254