0

I have a data.frame X with column X and a data.frame C with M binary values (0/1). Both data.frames have N rows (examples). I would like to average X on each case 0/1 of each m out of M column of C. When I plot this, I accept to get M*2 bars where x axis are the column names of each column in C and red/blue is for when catergory m (out of M) is either 0/1.

Can this be done using ggplot2? Any other quick way to do that without for loops?

Result sketch:

      *
*     *           *
*     *     *     *
m1=0, m1=1, m2=0, m2=1 ,....

Thanks, Hanan

data sample below: aggregate(X, by = as.list(C), FUN=mean) will aggregate to any combination of C. This is not what I want. I want X aggregated for every value of each column of C INDEPENDENTLY .

X<-structure(list(V1 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 1L, 0L, 1L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 
0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L)), 
.Names = "V1", class = "data.frame", row.names = c(NA, -100L))


C<-structure(list(V1 = c(1L, 0L, 1L, 0L, 1L, 1L, 0L, 0L, 1L, 1L, 
0L, 1L, 0L, 0L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 
0L, 0L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 0L, 1L, 1L, 0L, 1L, 1L, 0L, 
1L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 1L, 
1L, 1L, 1L, 1L, 0L, 1L, 0L, 0L, 1L, 1L, 0L, 1L, 0L, 0L, 1L, 1L, 
1L, 1L, 1L, 0L, 1L, 0L, 1L, 1L, 1L, 1L, 0L, 0L, 1L, 1L, 1L, 0L, 
1L, 1L, 1L, 1L, 1L, 0L, 0L, 1L, 1L, 1L), V2 = c(1L, 0L, 1L, 0L, 
1L, 1L, 0L, 0L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 1L, 1L, 
1L, 0L, 1L, 1L, 1L, 1L, 0L, 0L, 1L, 0L, 1L, 1L, 1L, 1L, 0L, 1L, 
1L, 1L, 0L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 0L, 1L, 0L, 0L, 1L, 1L, 
0L, 1L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 0L, 1L, 0L, 1L, 1L, 1L, 1L, 
0L, 0L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 1L, 1L, 1L
), V3 = c(1L, 0L, 1L, 0L, 1L, 1L, 0L, 0L, 1L, 0L, 0L, 1L, 1L, 
1L, 1L, 0L, 1L, 0L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 0L, 0L, 1L, 
0L, 1L, 1L, 1L, 1L, 1L, 0L, 1L, 1L, 0L, 1L, 1L, 0L, 1L, 1L, 1L, 
1L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 
1L, 0L, 1L, 0L, 0L, 1L, 1L, 0L, 1L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 
0L, 1L, 0L, 1L, 1L, 1L, 1L, 0L, 0L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 
1L, 1L, 0L, 0L, 1L, 1L, 1L)), 
.Names = c("V1", "V2", "V3"), class = "data.frame", row.names = c(NA, -100L))
MrFlick
  • 195,160
  • 17
  • 277
  • 295
Hanan Shteingart
  • 8,480
  • 10
  • 53
  • 66
  • 1
    It would be much easier to help you if you created a proper [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample data that we could use to test possible solutions. – MrFlick Jun 13 '14 at 13:53
  • gave you some data example – Hanan Shteingart Jun 14 '14 at 19:01

1 Answers1

1

Here is a way to transform your data broken down by incremental steps

dd <- do.call(rbind, 
  Map(function(a,b) cbind(C=a, b), names(C), 
    lapply(
      lapply(
        lapply(C, table, X[[1]], dnn=c("CV","X")), 
     as.data.frame), 
   subset, X==1)
))

So here we use table() to get the counts of each X value for each C value. Then we turn that into a data.frame and take only the counts for X=1. Finally we add the correct name of the C column and merge all the data.frames into one large data.frame.

Then we can plot that with

ggplot(dd, aes(x=C, y=Freq, fill=CV)) + 
  geom_bar(position="dodge", stat="identity")

So the columns of C are listed along the x-axis and the values of C are represented by the color of the bar. The counts of X=1 in each of the groups are the heights of the bars.

enter image description here

MrFlick
  • 195,160
  • 17
  • 277
  • 295