6

I'm trying to make a cross tabulation in R, and having its output resemble as much as possible what I'd get in an Excel pivot table. The objective is to replace a report made manually with Excel and Word with one automated with R Markdown; data wrangling and charts have been already taken care of but some tables are missing. So, given this code:

set.seed(2)
df<-data.frame("ministry"=paste("ministry ",sample(1:3,20,replace=T)),"department"=paste("department ",sample(1:3,20,replace=T)),"program"=paste("program ",sample(letters[1:20],20,replace=F)),"budget"=runif(20)*1e6)
library(tables)
library(dplyr)
arrange(df,ministry,department,program)
tabular(ministry*department~((Count=budget)+(Avg=(mean*budget))+(Total=(sum*budget))),data=df)

which yields (actual data is much more complicated):

                                 Avg    Total  
 ministry    department    Count budget budget 
 ministry  1 department  1 5     479871 2399356
             department  2 1     770028  770028
             department  3 1     184673  184673
 ministry  2 department  1 2     170818  341637
             department  2 1     183373  183373
             department  3 3     415480 1246440
 ministry  3 department  1 0        NaN       0
             department  2 5     680102 3400509
             department  3 2     165118  330235

This is as close as I could get to Excel results. I need to display subtotals, like this (generated in Excel using the exact same data):

Excel pivot table results with subtotals

Is it possible at all to get something like this in R (without manually coding the table cell-by-cell)?

Thanks!

s_a
  • 885
  • 3
  • 9
  • 22
  • Actually at some point for some reason I changed the seed number and it's not the exact same data as in the screenshot, but it doesn't affect the question at all. ¯\_(ツ)_/¯ – s_a May 14 '15 at 14:28

1 Answers1

3

Replace the left hand side with:

ministry * (department + 1) + 1

That is, try this:

tabular(ministry * (department + 1) + 1 ~
           ((Count = budget) + (Avg = (mean * budget)) + (Total = (sum * budget))), 
        data = df)

giving:

                                 Avg    Total  
 ministry    department    Count budget budget 
 ministry  1 department  1  5    479871 2399356
             department  2  1    770028  770028
             department  3  1    184673  184673
             All            7    479151 3354057
 ministry  2 department  1  2    170818  341637
             department  2  1    183373  183373
             department  3  3    415480 1246440
             All            6    295242 1771449
 ministry  3 department  1  0       NaN       0
             department  2  5    680102 3400509
             department  3  2    165118  330235
             All            7    532963 3730744
             All           20    442813 8856250

Update: correction.

G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
  • Can't believe it was so easy, thanks!. Do you know how to hide the NaN row too? (ministry 3 dept 1) – s_a May 14 '15 at 14:42
  • (They really need to work on the vignettes, they're not helpful at all for us beginners at R.) – s_a May 14 '15 at 14:45
  • 2
    If `tab` is the result shown above then `tab[,2] <- lapply(tab[,2], function(x) if (is.nan(x)) 0 else x)` will replace the `NaN` with `0`. – G. Grothendieck May 14 '15 at 14:47
  • I was looking for `tab[tab[,1]!=0,]` (hide the rows with zero frequency), but thanks, I learned to do it from your example! (Though I still don't understand the object.) – s_a May 14 '15 at 14:55
  • @s_a still easier than proc tabulate :} – rawr May 14 '15 at 15:06
  • @rawr if that's a linux reference, I've never used the command. `str` would be the equivalent, I think, but it doesn't help much. Internally a `tabular` object is a single straight list, and everything else is stored as attributes. – s_a May 14 '15 at 16:02
  • @s_a proc tabulate is a sas procedure which was kinda the motivation for this package, see the intro [here](http://cran.r-project.org/web/packages/tables/vignettes/tables.pdf) – rawr May 14 '15 at 16:07