Complex Crosstabulations in R

Question

Is there a package in R to generate a tabulation similar to the SPSS output in the attached picture. The crosstabulation involve 4 variables. 1 variable in the row and 3 other variables in the columns at the same level. Only total is tabulated in counts while others are in percentages. I have also attached the example of the data structure. q0003 is in the row while q0001, q0004 and q0005 are in the columns. And below is the sample data to play with

q0001 <- c(1,1,1,1,2,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1)
q0003 <- c(1,4,1,1,2,1,1,4,1,3,1,3,1,1,2,1,1,2,1,2)
q0004 <- c(1,4,8,5,2,1,1,4,1,3,1,3,6,1,2,1,7,2,1,8)
q0005 <- c(1,4,1,1,2,5,1,4,1,3,1,3,5,1,2,1,6,2,1,6)
q0001 <- ordered(q0001, levels=c(1:2), labels=c("Consultant gynaecologist", "Senior Registrar in gynaecology"))
q0003 <- ordered(q0003, levels=c(1:4), labels=c("Has interest in endoscopy but yet to achieve competence in diagnostic procedures", "Can perform diagnostic procedures but not operative", "Can perform minor and intermediate operative procedures","Can perform advanced procedures"))
q0004 <- ordered(q0004, levels=c(1:8), labels=c("None","Basic Certificate Course in Nigeria","Basic Certificate Course outside Nigeria (e.g. FMAS)","Advanced Certificate Course in Nigeria","Advanced certificate Course outside Nigeria (e.g. DMAS)", "Euro Bachelors in Endoscopy", "Masters in Endoscopy (e.g. MMAS)","Post Fellowship training in Endoscopy or Reproductive surgery"))
q0005 <- ordered(q0005, levels=c(1:6), labels=c("Less than 5", "6-10","11-20", "21-30", "31-40", "More than 40"))
data<- data.frame(q0001,q0003,q0004, q0005)

And I finally have the complete data with Dput below

   dataEx <- structure(list(ID = 1:141, q0003 = structure(c(1L, 4L, 3L, 1L, 
   2L, 1L, 3L, 2L, 1L, 2L, 2L, 3L, 3L, 1L, 3L, 3L, 2L, 3L, 3L, 2L, 
   3L, 3L, 2L, 3L, 3L, 2L, 2L, 3L, 1L, 1L, 1L, 2L, 1L, 2L, 2L, 2L, 
   3L, 4L, 1L, 3L, 2L, 2L, 3L, 4L, 3L, 2L, 4L, 2L, 4L, 2L, 3L, 1L, 
   2L, 1L, 3L, 2L, 3L, 2L, 4L, 3L, 2L, 4L, 2L, 3L, 1L, 3L, 2L, 2L, 
   3L, 1L, 4L, 2L, 2L, 3L, 3L, 3L, 3L, 1L, 3L, 3L, 1L, 2L, 2L, 2L, 
   3L, 3L, 1L, 3L, 3L, 4L, 1L, 2L, 3L, 3L, 4L, 3L, 2L, 4L, 2L, 4L, 
   3L, 4L, 3L, 4L, 4L, 3L, 2L, 3L, 3L, 3L, 3L, 1L, 4L, 4L, 3L, 2L, 
   2L, 3L, 2L, 2L, 3L, 3L, 4L, 2L, 3L, 3L, 2L, 2L, 3L, 1L, 1L, 4L, 
   3L, 2L, 4L, 3L, 2L, 2L, 4L, 2L, 2L), .Label = c("Can perform advanced procedures", 
   "Can perform diagnostic procedures but not operative", "Can perform minor and intermediate operative procedures", 
   "Has interest in endoscopy but yet to achieve competence in diagnostic procedures"
   ), class = "factor"), q0001 = structure(c(2L, 2L, 2L, 2L, 3L, 
   2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
   2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
   2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 2L, 2L, 2L, 
   2L, 2L, 2L, 3L, 2L, 2L, 2L, 2L, 2L, 3L, 2L, 2L, 2L, 2L, 2L, 2L, 
   2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
   2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 2L, 2L, 2L, 2L, 2L, 2L, 
   2L, 2L, 2L, 3L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 
   2L, 2L, 2L, 2L, 3L, 2L, 3L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
   3L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("", "Consultant gynaecologist", 
   "Senior Registrar in gynaecology"), class = "factor"), q0004 = structure(c(6L, 
   7L, 3L, 8L, 7L, 3L, 4L, 7L, 6L, 7L, 7L, 3L, 4L, 8L, 5L, 3L, 4L, 
   3L, 4L, 4L, 5L, 4L, 5L, 5L, 2L, 7L, 4L, 4L, 8L, 6L, 3L, 4L, 3L, 
   4L, 4L, 4L, 4L, 7L, 3L, 5L, 7L, 4L, 3L, 4L, 6L, 4L, 4L, 3L, 7L, 
   4L, 2L, 3L, 3L, 8L, 3L, 4L, 2L, 7L, 7L, 1L, 7L, 7L, 7L, 3L, 8L, 
   5L, 5L, 4L, 5L, 3L, 4L, 4L, 4L, 3L, 3L, 5L, 5L, 8L, 8L, 2L, 3L, 
   3L, 4L, 3L, 3L, 7L, 2L, 4L, 3L, 5L, 3L, 2L, 4L, 3L, 4L, 2L, 7L, 
   7L, 4L, 7L, 3L, 4L, 3L, 4L, 4L, 8L, 4L, 2L, 2L, 8L, 3L, 3L, 4L, 
   4L, 5L, 7L, 5L, 3L, 2L, 4L, 3L, 5L, 7L, 3L, 5L, 3L, 4L, 7L, 3L, 
   6L, 8L, 7L, 4L, 2L, 4L, 4L, 4L, 4L, 4L, 7L, 7L), .Label = c("", 
   "Advanced Certificate Course in Nigeria", "Advanced certificate Course outside Nigeria (e.g. DMAS)", 
   "Basic Certificate Course in Nigeria", "Basic Certificate Course outside Nigeria (e.g. FMAS)", 
   "Euro Bachelors in Endoscopy", "None", "Post Fellowship training in Endoscopy or Reproductive surgery"
   ), class = "factor"), q0005 = structure(c(2L, 5L, 3L, 4L, 4L, 
   4L, 5L, 4L, 2L, 5L, 5L, 4L, 4L, 3L, 4L, 5L, 5L, 5L, 4L, 2L, 5L, 
   4L, 5L, 5L, 5L, 5L, 5L, 2L, 2L, 3L, 4L, 5L, 4L, 5L, 5L, 5L, 5L, 
   5L, 5L, 4L, 5L, 5L, 5L, 5L, 4L, 5L, 5L, 5L, 5L, 5L, 2L, 4L, 5L, 
   4L, 4L, 5L, 2L, 5L, 5L, 3L, 5L, 1L, 4L, 5L, 2L, 5L, 5L, 5L, 5L, 
   2L, 5L, 5L, 5L, 4L, 4L, 5L, 2L, 6L, 4L, 5L, 4L, 5L, 5L, 5L, 5L, 
   5L, 4L, 5L, 5L, 1L, 5L, 5L, 4L, 4L, 5L, 5L, 1L, 5L, 4L, 5L, 2L, 
   1L, 5L, 5L, 5L, 5L, 5L, 4L, 4L, 5L, 4L, 5L, 5L, 5L, 5L, 5L, 5L, 
   5L, 5L, 4L, 4L, 4L, 1L, 5L, 2L, 5L, 5L, 4L, 5L, 3L, 4L, 5L, 2L, 
   5L, 5L, 4L, 5L, 1L, 5L, 5L, 5L), .Label = c("", "11-20", "21-30", 
   "6-10", "Less than 5", "More than 40"), class = "factor")), class = "data.frame", row.names = c(NA, 
   -141L))

If you give us some toy data to play with, I'm sure at least one person will have a go. Personally, I suspect a combination of `dplyr` and 'knitr` will you do what you want. — Limey, Jun 07 '20 at 09:46
I have added the first 20 rows as picture, I don't know if that would help. — Dare, Jun 07 '20 at 10:00
Can you put the data into `dput` and paste it in the question? — Magnus Nordmo, Jun 07 '20 at 10:02
I'm afraid it doesn't. +1 to @MagnusNordmo. dput is the way toi fo. That way we can import your data and start to use it to give you what you want. See [this post](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610#5963610) for help on creating a simple, self-contained example. — Limey, Jun 07 '20 at 10:11
I have added some sample data with the data frame, hope this helps? — Dare, Jun 07 '20 at 10:39

Len Greski · Answer 1 · 2020-06-08T11:49:53.197

A variety of R packages can be used to create presentation quality tables, including:

condformat
DT
flexible
formattable
gt (part of the tidyverse)
huxtable
knitr
kableExtra
pander
pixie dust
reactable
stargazer
tangram
tables
xtable
ztable

Reference: gt package - how gt fits with other packages that generate display tables.

Since each of these packages is built for a specific purpose, further investigation is required to determine whether it fulfills the specific requirements from the original post.

a `tables` solution

In the introduction to the tables package, Duncan Murdoch writes that the package "...is inspired by his 20 year old memories of SAS PROC TABULATE." For those of us who were involved in data science before it was called data science, this brings back memories of complex tables printed on wide format line printers.

The tables package includes features that allow us to meet the requirements of the OP's question, including the ability to compare a variable on the row dimension of a table with one or more variables on the column dimension. Back in the day we used to call these types of reports "banner tables," where the banner (column) dimension consisted of a set of demographic questions that would be compared to one or more questions in a survey.

We will now demonstrate how these types of tables can be produced with tables::tabulate(). Absent reproducible data in the OP, here is an example table.

data <-data.frame(province=c("a","b"),
                      food=c("yes","no","no","yes","yes","no"),
                      shelter_type=c("unfinished","permanent","transitional"))
library(tables)
tabular(1*(n=1) + (Factor(province,"Province")*( colPct = Percent("col")))  ~ 
             (Factor(food) + Factor(shelter_type)),data = data )

...and the output:

                     food        shelter_type                        
                     no    yes   permanent    transitional unfinished
          All n       3.00  3.00  2            2            2        
 Province a   colPct 33.33 66.67 50           50           50        
          b   colPct 66.67 33.33 50           50           50

Using data from the original post...

In the comments I was asked to use the dput() that had been uploaded to replicate the table image that was posted with the original question. We can replicate it completely, including the percentage signs with the following R Markdown code.

---
title: "multiQuetionCrosstabs"
author: "lg"
date: "6/7/2020"
output:
  html_document: 
    keep_md: yes
  pdf_document: default
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

```{r echo=FALSE}

data <- read.csv("../data/multiQuestionCrosstabs.csv",header = TRUE,
                 stringsAsFactors = FALSE)
data$q0001[data$q0001 == ""] <- "Unknown"
data$q0003[data$q0003 == ""] <- "Unknown"
data$q0004[data$q0004 == ""] <- "Unknown"
data$q0005[data$q0005 == ""] <- "Unknown"
## create table 
library(tables,quietly = TRUE)

fmt <- function(x,digits=0,...){
     sprintf("%.0f%%",x)
}
tab <- tabular(1*(n=1) + (Factor(q0003,"Experience")*Format(fmt(digits=0))*(Percent("col")))  ~ 
             (Factor(q0001,"Position") + Factor(q0004,"Additional training") + 
                   Factor(q0005,"Monthly hysteroscopic procedures")),
        data = data )
library(knitr)
library(kableExtra)
toKable(tab)

```

Note that I wrote the dput() to a CSV, reloaded the data, and converted the factors to character variables in order to eliminate the blank values from the factors.

When we knit to HTML and view the markdown file in a markdown viewer the output table looks like this:

Can you reproduce this with the Dput data in the example? the same format, with the percentages sign and not in decimals. — Dare, Jun 07 '20 at 11:40
@Dare - at this point I have replicated everything except the percentage signs, and I'm working on the custom format function for the percentage signs. — Len Greski, Jun 07 '20 at 21:35
@akrun - without the `dput()`, definitely. With the `dput()` it's more debatable. OTOH an argument could be made for a close based on opinion based answers since there are multiple packages that support tabular output like SPSS Tables or SAS PROC TABULATE. I'd be interested in your opinion on how you decide when to answer vs. close as duplicate, because the decision seems pretty arbitrary to me. The funny thing about the JHU rprog assignment 1 dupes is that the Coursera site has a lot of content on that problem (some of which I have written) that students fail to read before they come to SO. — Len Greski, Jun 07 '20 at 21:45
@LenGreski For me, that decision is very simple. I don't believe in dupe tagging because that is a waste of time. I joined the SO in 2013 may be. At that time, daily I see lots of questions that were previously asked. In 2014, daily, the questions from 2013 were also included, ..., in 2020, it now includes all the questions from 2013:2020 along with previous years. For me, it is easier to answer than to check for dupes and I believe you know that as a software developer, most important thing is to practise. Where do you get the practise if all the questions are dupe tagged :=) — akrun, Jun 07 '20 at 21:50

Complex Crosstabulations in R

1 Answers1

a tables solution

Using data from the original post...

a `tables` solution