Recreating Excel pivot tables in R - tabular or qpvt

Question

Here is a sample dataset of patient data (the actual one has ~26k rows).

library(pivottabler)
library(tables)
library(tidyverse)

df <- tribble(
  ~Patient, ~Physician, ~Specialty, ~Duration, ~CostWeight,
  1, "A", "Family Medicine", 5, 1.4215,
  2, "A", "Family Medicine", 7, 1.6463,
  3, "B", "Pediatrics", 1, 1.0214,
  4, "B", "Pediatrics", 3, 1.2345,
  5, "B", "Pediatrics", 6, 6.3243, 
  6, "C", "Neurology", 10, 2.5104,
  7, "D", "Dermatology", 0, 1.0424,
  8, "E", "Family Medicine", 2, 1.5234
)

Picture of data.

Basically, I'm trying to recreate Excel pivot tables in R since this will be on a monthly basis with a different dataset, and it would be great to just plug in the file and run the code to get the pivot tables rather than doing it by hand.

I want a table that looks like this.

Desired characteristics:

Need the total row for each Specialty grouping
Need the grand total row
Table that can be customized using kable()/kableExtra package

Attempt 1

I've tried using this example.

tabular(Specialty * (Physician + 1) + 1 ~
          (
            (Count = Patient) +
            (avgDuration = mean(Duration)) +
            (avgCostWeight = mean(CostWeight))
          ), 
        data = df)

However, I keep getting an error saying

Error in term2table(rows[[i]], cols[[j]], data, n) : Duplicate values: Specialty and Physician".

Attempt 2

I have also tried using the pivottabler package using this post.

pt <- df %>%
  qpvt(c("Specialty", "Physician"), NULL, c("Discharges" = "n()", 
                                            "avgDuration" = "mean(Duration)", 
                                            "avgCostWeight" = "mean(CostWeight)")
       )

pt$evaluatePivot()
df <- pt$asDataFrame()

df %>%
  kbl()

However, the output is less than desirable since it concatenates the Specialty and Physician columns into one.

Does anybody have any clues?

Daniel_j_iii · Answer 1 · 2020-10-28T02:42:56.710

0

You should look into the library DT for a long term tool to use, just using a few lines of code I was able to build a minimal example of what you are looking for, you may need to install it first.

library(DT)
library(tidyverse)

df <- tribble(
  ~Patient, ~Physician, ~Specialty, ~Duration, ~CostWeight,
  1, "A", "Family Medicine", 5, 1.4215,
  2, "A", "Family Medicine", 7, 1.6463,
  3, "B", "Pediatrics", 1, 1.0214,
  4, "B", "Pediatrics", 3, 1.2345,
  5, "B", "Pediatrics", 6, 6.3243, 
  6, "C", "Neurology", 10, 2.5104,
  7, "D", "Dermatology", 0, 1.0424,
  8, "E", "Family Medicine", 2, 1.5234
)

#getting count
df %>% 
  group_by(Specialty) %>% 
  count()

#creating table
datatable(df_grouped, filter = 'top', options = list(
  pageLength = 5, autoWidth = TRUE
))

The count filter gets the count like in your pivot table, and then the DT table is dynamic and filter capable

edited Oct 28 '20 at 02:42

answered Oct 27 '20 at 02:53

Daniel_j_iii

3,041
2
11
27

But how do I get the group total row and the grand total row? I want a column for total counts, average duration, and average cost weight for the groupings and a grand total like shown in the pivot table screenshot. – tonybot Oct 27 '20 at 17:45
the pivot table has a lot of duplicated rows. I think you could make a table for `group_by()` with total count by physician and then another table of patients. – Daniel_j_iii Oct 27 '20 at 18:54
Or you make a *long* dataset – Daniel_j_iii Oct 27 '20 at 19:06
I mean, the pivot table for the sample dataset has duplicated rows, but my actual dataset has >26,000 observations, so the pivot table for that wouldn't have the duplicated rows. I think I'll just do these by hand in Excel. – tonybot Oct 27 '20 at 22:08
How are you counting average duration, number_of_patients/durations? I understand the you want the pivot table, but I am not sure if the way the pivot table is displayed it a actually a great way to tabulate the data, if the column has physician AND specialty within the same column, how can you properly analyze that? that's why I suggest two tables, i updated my answer to get each `group_by()` – Daniel_j_iii Oct 28 '20 at 02:41
Average duration is the total duration of all the patients for a physician divided by the number of patients for that physician. I actually found a solution by calculating the group total row and grand total row, using rbind(), and then using the flextable package with the horizontal and vertical merging. – tonybot Oct 28 '20 at 17:17
Please post your answer code, I haven’t looked into flextable yet. It would be a great learning opportunity. Thank you – Daniel_j_iii Oct 28 '20 at 17:46

Recreating Excel pivot tables in R - tabular or qpvt

1 Answers1