R programming: plyr how to count values from a column with ddply

Question

I would like to summarize the pass/fail status for my data as below. In other words, I would like to tell the number of pass and fail cases for each product/type.

library(ggplot2)
library(plyr)
product=c("p1","p1","p1","p1","p1","p1","p1","p1","p1","p1","p1","p1","p2","p2","p2","p2","p2","p2","p2","p2","p2","p2","p2","p2")
type=c("t1","t1","t1","t1","t1","t1","t2","t2","t2","t2","t2","t2","t1","t1","t1","t1","t1","t1","t2","t2","t2","t2","t2","t2")
skew=c("s1","s1","s1","s2","s2","s2","s1","s1","s1","s2","s2","s2","s1","s1","s1","s2","s2","s2","s1","s1","s1","s2","s2","s2")
color=c("c1","c2","c3","c1","c2","c3","c1","c2","c3","c1","c2","c3","c1","c2","c3","c1","c2","c3","c1","c2","c3","c1","c2","c3")
result=c("pass","pass","fail","pass","pass","pass","fail","pass","fail","pass","fail","pass","fail","pass","fail","pass","pass","pass","pass","fail","fail","pass","pass","fail")
df = data.frame(product, type, skew, color, result)

The following cmd returns the total number of pass+fail cases but I want separate columns for pass and fail

dfSummary <- ddply(df, c("product", "type"), summarise, N=length(result))

Result is:

        product type N
 1      p1      t1   6
 2      p1      t2   6
 3      p2      t1   6
 4      p2      t2   6

The desireable result would be

         product type Pass Fail
 1       p1      t1   5    1
 2       p1      t2   3    3
 3       p2      t1   4    2
 4       p2      t2   3    3

I have attempted somthing like:

 dfSummary <- ddply(df, c("product", "type"), summarise, Pass=length(df$product[df$result=="pass"]), Fail=length(df$product[df$result=="fail"]) )

but obviously it’s wrong since the results are the grand totatl for fail and pass.

Thanks in advance for your advice ! Regards, Riad.

ialm · Accepted Answer · 2013-11-20T18:23:52.040

Try:

dfSummary <- ddply(df, c("product", "type"), summarise, 
                   Pass=sum(result=="pass"), Fail=sum(result=="fail") )

Which gives me result:

  product type Pass Fail
1      p1   t1    5    1
2      p1   t2    3    3
3      p2   t1    4    2
4      p2   t2    3    3

Explanation:

You are giving the data set, df to the ddply function.
ddply is splitting on the variables, "product" and "type"
- This results in length(unique(product)) * length(unique(type)) pieces (i.e. subsets of the data df) split on every combination of the two variables.
With each of the pieces, ddply applies some function that you provide. In this case, you count the number of result=="pass" and result=="fail" there are.
Now ddply is left with some results for each piece, namely the variables you split on (product and type) and the results you requested (Pass and Fail).
It combines all of the pieces together and returns it

Perfect, that's what I needed ! Thx for the prompt answer ! – Riad Nov 20 '13 at 18:11 — Riad, Nov 20 '13 at 18:11

score 4 · Answer 2 · answered Nov 21 '13 at 00:51

4

You could also use reshape2::dcast.

library(reshape2)
dcast(product + type~result,data=df, fun.aggregate= length,value.var = 'result')
##   product type fail pass
## 1      p1   t1    1    5
## 2      p1   t2    3    3
## 3      p2   t1    2    4
## 4      p2   t2    3    3

answered Nov 21 '13 at 00:51

mnel

113,303
27
265
254

Much faster than ddply. Thanx :) – AnksG Mar 13 '19 at 13:46

R programming: plyr how to count values from a column with ddply

2 Answers2