12

I am writing a report that requires the generation of a number of pivot tables in Excel. I would like to think there is a way to do this in R so that I can avoid Excel. I would like output like the screenshot below (teacher names redacted). As far as I can tell, I could use the reshape package to calculate the aggregate values, but I'd need to do that a number of times and somehow get all of the data in the correct order. At that point, I should just be doing it in Excel. Does anyone have any suggestions or package recommendations? Thank you!

(EDIT) The data starts as a list of students, their teacher, school, and growth. This data is then aggregated to get a list of teachers with their average class growth. Please note the teachers are then grouped by school. The largest problem I foresee doing this with R as of now is how do you get the subtotal and total rows (BSA1 Total, Grand Total, etc) in there since they are not the same type of observation as the others? Do you just manually have to calculate them and try to get them in the correct order so they appear at the bottom of that group?

example
(source: imgh.us)

Glorfindel
  • 21,988
  • 13
  • 81
  • 109
Jeff Erickson
  • 3,783
  • 8
  • 36
  • 43
  • The question as it stands now is too vague to be answered reasonably on the site here in my opinion. I can only say: learn latex if you don't know that already, look at this question : http://stackoverflow.com/questions/5465314/tools-for-making-latex-tables-in-r and get the dataframe in the right format using eg `plyr` or `reshape` or base functions like `cast` etc. Getting the output you have is quite easy in R, but you'll get it in text format. The layout should be done elsewhere. – Joris Meys Jul 12 '11 at 15:55
  • You should look into R Data Frames. – ATMathew Jul 12 '11 at 15:56
  • 1
    There are certainly a number of easy ways to do this in R but it would be much better if you could provide what the data start out as in addition to the result you want you'll get better and more applicable code. – John Jul 12 '11 at 15:59
  • it seems like there's some issue with ordering, but you don't say what it is. What's the problem? – JD Long Jul 12 '11 at 16:06
  • Thank you all for comments. Please read my edit to the question. – Jeff Erickson Jul 12 '11 at 16:12
  • @Jeff That's indeed a valid question for the site. It would be nice if you could give us an example of some data to illustrate. How to do this, see : http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – Joris Meys Jul 12 '11 at 16:17
  • @Joris If I were to worry about formatting, would you recommend using Sweave for this particular project? Do you have any other recommendations? Thank you. – Jeff Erickson Jul 12 '11 at 18:13
  • @jeff do you know LaTex already? If you do then Sweave makes some sense. If not, it's got a bit of a learning curve. While it's a bit sacrilegious, if your org is highly wedded to Excel you could use Excel as your presentation layer using the XLConnect package. For internal use many folks where I work prefer Excel. XLConnect allows controlling formatting and creation of Excel files... even from Linux, which I like. – JD Long Jul 12 '11 at 18:22
  • @JD Great. I will look into both. I have some LaTeX experience and will give it a shot. – Jeff Erickson Jul 12 '11 at 18:27
  • it would be interesting to see the solution in dplyr – userJT Jul 10 '14 at 16:46

4 Answers4

19

Here's a swag at the calculation bits:

set.seed(1)
school  <- sample(c("BSA1", "BSA2", "HSA1"), 100, replace=T)
teacher <- sample(c("Tom", "Dick", "Harry"), 100, replace=T)
growth <- rnorm(100, 5, 3)

myDf <- data.frame(school, teacher, growth)

require(reshape2)

aggregate(growth ~ school + teacher, data =myDf, FUN=mean)

myDf.melt <- melt(myDf, measured="growth")
dcast(myDf.melt, school + teacher ~ ., fun.aggregate=mean, margins=c("school", "teacher"))

I've not addressed output formatting, only calculation. The resulting data frame should look like this:

   school teacher       NA
1    BSA1    Dick 4.663140
2    BSA1   Harry 4.310802
3    BSA1     Tom 5.505247
4    BSA1   (all) 4.670451
5    BSA2    Dick 6.110988
6    BSA2   Harry 5.007221
7    BSA2     Tom 4.337063
8    BSA2   (all) 5.196018
9    HSA1    Dick 4.508610
10   HSA1   Harry 4.890741
11   HSA1     Tom 4.721124
12   HSA1   (all) 4.717335
13  (all)   (all) 4.886576

That example uses the reshape2 package to handle the subtotals.

I think R is the right tool for the job here. I can totally understand not being sure how to get started on this analysis. I came to R from Excel a few years ago and it can be tough to grok at first. Let me point out four pro tips to help you get better answers in Stack Overflow:

1) provide data, even if simulated: you can see I simulated some data at the beginning of my answer. If you had provided that simulation it would have a) saved me time b) gotten you an answer that used your own data structure, not one I dreamed up and c) other people would have answered. I often skip questions with no data because I've grown tired of guessing about the data them being told my answer sucked because I guessed wrong.

2) Ask one clear question. "How do I do my work" is not a single clear question. "How do I take this example data and create subtotals in the aggregation like this example output" is a single specific question.

3) keep asking! We all get better with practice. You're trying to do more in R and less in Excel so you're clearly of above average intelligence. Keep using R and keep asking questions. It will all get easier in time.

4) Be careful with your words when you describe things. You say in your edited question you have a "list" of things. A list in R is a specific data structure. I'm suspicious you actually have a data frame and are using the term "list" in a generic sense. This can make for some confusion. It also illustrates why you want to provide your own data.

JD Long
  • 59,675
  • 58
  • 202
  • 294
  • 2
    If you were to then address formatting, would you recommend using xtable and Sweave? Or something else you recommend? Thank you again. – Jeff Erickson Jul 12 '11 at 18:19
10

Using JD Long's simulated data, and adding the sd and counts:

   library(reshape)  # not reshape2
   cast(myDf.melt, school + teacher ~ ., margins=TRUE , c(mean, sd, length))
   school teacher     mean       sd length
1    BSA1    Dick 4.663140 3.718773     14
2    BSA1   Harry 4.310802 1.430594      9
3    BSA1     Tom 5.505247 4.045846      4
4    BSA1   (all) 4.670451 3.095980     27
5    BSA2    Dick 6.110988 2.304104     15
6    BSA2   Harry 5.007221 2.908146      9
7    BSA2     Tom 4.337063 2.789244     14
8    BSA2   (all) 5.196018 2.682924     38
9    HSA1    Dick 4.508610 2.946961     11
10   HSA1   Harry 4.890741 2.977305     13
11   HSA1     Tom 4.721124 3.193576     11
12   HSA1   (all) 4.717335 2.950959     35
13  (all)   (all) 4.886576 2.873637    100
IRTFM
  • 258,963
  • 21
  • 364
  • 487
1

Below are several different ways of generating this using the relatively new pivottabler package.

Disclosure: I'm the package author.

For more information see the package page on CRAN and the various package vignettes available on that page.

Sample Data (same as above)

set.seed(1)
school  <- sample(c("BSA1", "BSA2", "HSA1"), 100, replace=T)
teacher <- sample(c("Tom", "Dick", "Harry"), 100, replace=T)
growth <- rnorm(100, 5, 3)
myDf <- data.frame(school, teacher, growth)

Quick pivot table output to console as plain text

library(pivottabler)
# arguments:  qhpvt(dataFrame, rows, columns, calculations, ...)
qpvt(myDf, c("school", "teacher"), NULL,  
     c("Average Growth"="mean(growth)", "Std Dev"="sd(growth)",
       "# of Scholars"="n()"),
     formats=list("%.1f", "%.1f", "%.0f"))

Console Output:

              Average Growth  Std Dev  # of Scholars  
BSA1   Dick              4.7      3.7             14  
       Harry             4.3      1.4              9  
       Tom               5.5      4.0              4  
       Total             4.7      3.1             27  
BSA2   Dick              6.1      2.3             15  
       Harry             5.0      2.9              9  
       Tom               4.3      2.8             14  
       Total             5.2      2.7             38  
HSA1   Dick              4.5      2.9             11  
       Harry             4.9      3.0             13  
       Tom               4.7      3.2             11  
       Total             4.7      3.0             35  
Total                    4.9      2.9            100  

Quick pivot table output as a html widget

library(pivottabler)
qhpvt(myDf, c("school", "teacher"), NULL,  
     c("Average Growth"="mean(growth)", "Std Dev"="sd(growth)",
       "# of Scholars"="n()"),
     formats=list("%.1f", "%.1f", "%.0f"))

HTML Widget Output:

enter image description here

Generating pivot table using more verbose syntax

This has more options, e.g. renaming totals.

library(pivottabler)
pt <- PivotTable$new()
pt$addData(myDf)
pt$addRowDataGroups("school", totalCaption="(all)")
pt$addRowDataGroups("teacher", totalCaption="(all)")
pt$defineCalculation(calculationName="c1", caption="Average Growth", 
   summariseExpression="mean(growth)", format="%.1f")
pt$defineCalculation(calculationName="c2", caption="Std Dev", 
   summariseExpression="sd(growth)", format="%.1f")
pt$defineCalculation(calculationName="c3", caption="# of Scholars", 
   summariseExpression="n()", format="%.0f")
pt  # to output to console as plain text
pt$renderPivot() # to output as a html widget

HTML Widget Output:

enter image description here

cbailiss
  • 1,304
  • 11
  • 21
  • I have used this "Quick pivot table output to console as plain text" method to create a pivot. Is there a way to save the output in a data frame. I can getting the following error "cannot coerce class ‘c("PivotTable", "R6")’ to a data.frame" – Rohan Bali Jun 08 '20 at 10:16
  • See: http://www.pivottabler.org.uk/articles/v05-outputs.html#results-as-an-r-data-frame – cbailiss Jun 08 '20 at 10:25
0

Sorry for autopromotion but take a look at my package expss.

Code for generating output below:

set.seed(1)
school  <- sample(c("BSA1", "BSA2", "HSA1"), 100, replace=T)
teacher <- sample(c("Tom", "Dick", "Harry"), 100, replace=T)
growth <- rnorm(100, 5, 3)

myDf <- data.frame(school, teacher, growth)

library(expss)
myDf %>% 
    # 'tab_cells' - variables on which statistics will be calculated
    # "|" is needed to suppress 'growth' in row labels
    tab_cells("|" = growth) %>%  
    # 'tab_cols' - variables for columns. Can be ommited
    tab_cols(total(label = "")) %>% 
    # 'tab_rows' - variables for rows.
    tab_rows(school %nest% list(teacher, "(All)"), "|" = "(All)") %>% 
    # 'method = list' is needed for statistics labels in column
    tab_stat_fun("Average Growth" = mean, 
                 "Std Dev" = sd, 
                 "# of scholars" = length, 
                 method = list) %>% 
    # finalize table
    tab_pivot()

Code above gives object inherited from data.frame which can be used with standard R operations (subsetting with [ and etc.). But there is a special print method for this object. Console output:

 |       |       | Average Growth | Std Dev | # of scholars |
 | ----- | ----- | -------------- | ------- | ------------- |
 |  BSA1 |  Dick |            4.7 |     3.7 |            14 |
 |       | Harry |            4.3 |     1.4 |             9 |
 |       |   Tom |            5.5 |     4.0 |             4 |
 |       | (All) |            4.7 |     3.1 |            27 |
 |  BSA2 |  Dick |            6.1 |     2.3 |            15 |
 |       | Harry |            5.0 |     2.9 |             9 |
 |       |   Tom |            4.3 |     2.8 |            14 |
 |       | (All) |            5.2 |     2.7 |            38 |
 |  HSA1 |  Dick |            4.5 |     2.9 |            11 |
 |       | Harry |            4.9 |     3.0 |            13 |
 |       |   Tom |            4.7 |     3.2 |            11 |
 |       | (All) |            4.7 |     3.0 |            35 |
 | (All) |       |            4.9 |     2.9 |           100 |

Output via htmlTable in knitr, RStudio viewer or Shiny:

Gregory Demin
  • 4,596
  • 2
  • 20
  • 20