0

I would like to take an .xlsx file of data in columns, and turn it into a pivot table in which a subset of column data is summed based on row groups.

I have a table that looks like this:

| Class   | Instructor  | ContactHours|
| --------| ------------| ------------|
| Class A | Instructor A| 7.2         |
| Class B | Instructor A| 5           |
| Class C | Instructor A| 3           |
| Class D | Instructor B| 3           |
| Class E | Instructor B| 3           |
| Class F | Instructor C| 7.4         |
| Class G | Instructor D| 7.4         |
| Class H | Instructor D| 5           |

And want to total the Contact Hours (CH) based on rows of Instructors, like this: Example pivot table

| Instructor   | Class  | CH  | 
| -------------| -------| --- |
| Instructor A |        | 15.2|
|              | Class A| 7.2 |
|              | Class B| 5   |
|              | Class C| 3   |
| Instructor B |        | 6   |
|              | Class D| 3   |
|              | Class E| 3   |
| Instructor C |        | 7.4 |
|              | Class F| 7.4 | 
| Instructor D |        | 12.4|
|              | Class G| 7.4 | 
|              | Class H| 5   |

I have tried pivottablr and other packages and have gotten the furthest with pivottablr.

With this code I am able to get all but the column of Total CH. Instead I get the column but it is full of zeroes.

pat <- read_csv("test.csv", skip=2, col_names = TRUE)
pt <- PivotTable$new()
pt$addData(pat)
pt$addRowDataGroups("Instructor", addTotal=TRUE)
pt$addRowDataGroups("Class", addTotal=FALSE)
pt$addRowDataGroups("ContactHours", addTotal=FALSE)
pt$defineCalculation(calculationName="TotalCH", summariseExpression="sum()")
pt$renderPivot()

pivottabler output

| Instructor   | Class  | CH  | TotalCH |
| -------------| -------| --- |---------|
| Instructor A | Class A| 7.2 | 0       |
|              | Class B| 5   | 0       |
|              | Class C| 3   | 0       |
| Instructor B | Class D| 3   | 0       |
|              | Class E| 3   | 0       |
| Instructor C | Class F| 7.4 | 0       | 
| Instructor D | Class G| 7.4 | 0       | 
|              | Class H| 5   | 0       |

But I have also gotten (mostly) what I want with dplyr

pat %>%
  group_by(Class) %>%
  group_by(Instructor) %>%
  mutate(sum_ContactHours =  sum(ContactHours)) %>%
  ungroup()

Which gives me this:

# A tibble: 8 × 4
  Class   Instructor   ContactHours sum_ContactHours
  <chr>   <chr>               <dbl>            <dbl>
1 Class A Instructor A          7.2             15.2
2 Class B Instructor A          5               15.2
3 Class C Instructor A          3               15.2
4 Class D Instructor B          3                6  
5 Class E Instructor B          3                6  
6 Class F Instructor C          7.4              7.4
7 Class G Instructor D          7.4             12.4
8 Class H Instructor D          5               12.4
Mark
  • 7,785
  • 2
  • 14
  • 34
Joneson
  • 1
  • 2
  • you can get a bit more of the way there by using `pivot$defineCalculation(summariseExpression="sum(ContactHours)")`. caption is meant to change the name of it, but it doesn't seem to do anything – Mark Aug 20 '23 at 09:22
  • Thank-you @Mark. When I add it as the following (I had to add the name of the calculation) I still get the column of zeroes. pt$defineCalculation(calculationName="TotalCH", summariseExpression="sum(ContactHours)") – Joneson Aug 20 '23 at 15:16

1 Answers1

0

You could achieve your desired result using vanilla dplyr by summarizing your data and binding it to the original dataset using bind_rows. The rest is some re-arranging and getting rid of the the duplicated Instructor labels:

library(dplyr, warn = FALSE)
pat |>
  summarise(
    Class = "",
    ContactHours = sum(ContactHours), .by = Instructor
  ) |>
  bind_rows(pat) |>
  arrange(Instructor, Class) |>
  mutate(
    Instructor = c(first(Instructor), rep("", n() - 1)),
    .by = Instructor
  )
#>      Instructor   Class ContactHours
#> 1  Instructor A                 15.2
#> 2               Class A          7.2
#> 3               Class B          5.0
#> 4               Class C          3.0
#> 5  Instructor B                  6.0
#> 6               Class D          3.0
#> 7               Class E          3.0
#> 8  Instructor C                  7.4
#> 9               Class F          7.4
#> 10 Instructor D                 12.4
#> 11              Class G          7.4
#> 12              Class H          5.0

DATA

pat <- data.frame(
  Class = c(
    "Class A", "Class B", "Class C",
    "Class D", "Class E", "Class F", "Class G", "Class H"
  ),
  Instructor = c(
    "Instructor A", "Instructor A",
    "Instructor A", "Instructor B", "Instructor B",
    "Instructor C", "Instructor D", "Instructor D"
  ),
  ContactHours = c(7.2, 5, 3, 3, 3, 7.4, 7.4, 5)
)
stefan
  • 90,330
  • 6
  • 25
  • 51
  • Thank-you @stefan! This is fantastic and solves the problem. I will work to familiarize myself more with binding, mutating, and arranging. – Joneson Aug 27 '23 at 13:58