3

In an rmarkdown document, I'm creating a Latex table of regression coefficients with standard errors to compare several regression models in a single table. I'd like to vertically align the coefficients for each model so that the decimal points of the coefficients line up vertically down a column.

I'm using texreg to create the table. The coefficients aren't decimal-aligned by default (instead, each string is centered within its column) and I'm looking for a way to get the coefficents decimal-aligned. I'm not wedded to texreg, so if you have a solution using xtable, pander, stargazer or any other method, I'd be interested in that as well. Ideally, I'd like a solution that can be implemented programmatically within the rmarkdown document, rather than tweaking the latex markup after rendering the document into a .tex file.

As a bonus, I'd also like to be able to put line breaks in table headings. For example, in texreg you can use the custom.model.names argument to set the column names for each regression model. In the example below, I'd like to have "Add Horsepower and AM" split into two lines so that the column doesn't need to be so wide. I tried "Add Horsepower \newline and AM" but that just adds "ewline" to the final column header and the "\n" is ignored.

Here's a reproducible example:

---
title: "Regression Table"
author: "eipi10"
date: "August 15, 2016"
header-includes:
    - \usepackage{dcolumn}
output: pdf_document
---

```{r, echo=FALSE, message=FALSE, results="asis"}
library(texreg)

m1 = glm(mpg ~ wt + factor(cyl), data=mtcars)
m2 = glm(mpg ~ wt + factor(cyl) + hp + factor(am), data=mtcars)

texreg(list(m1,m2),
       single.row=TRUE, 
       custom.model.names=c("Base Model", "Add Horsepower and AM"),
       custom.coef.names=c("Intercept", "Weight","Cyl: 6", "Cyl: 8", "Horsepower","AM: 1"))
```

And here's what the output table looks like:

enter image description here

eipi10
  • 91,525
  • 24
  • 209
  • 285
  • you probably need some extra backslashes when inserting newlines, e.g. `"\\newline"` – Ben Bolker Aug 16 '16 at 00:14
  • That doesn't work either. I get "Add Horsepowerand AM". I also tried `\\\`, which just adds an extra space between "Horsepower" and "and". – eipi10 Aug 16 '16 at 00:18
  • hmm, started looking in tex.stackexchange.com but didn't get that far. – Ben Bolker Aug 16 '16 at 00:41
  • Have you tried the ``dcolumn = TRUE`` argument? This provides decimal point alignment using the ``dcolumn`` package. You may want to use it in combination with the argument ``use.packages = FALSE`` to avoid any preamble commands right before the table float. This is all documented in the help files and in the JSS article. – Philip Leifeld Aug 17 '16 at 11:40
  • I get a Pandoc error with this method `! Misplaced \noalign. \hline ->\noalign {\ifnum 0=\`}\fi \hrule \@height \arrayrulewidth \futurelet... l.403 D\{)\}\{)\}\{11)3\} \} \hline`. – eipi10 Aug 17 '16 at 15:28

2 Answers2

1

Here is an attempt using broom. You'll still need to clean up the labels though.

library(broom)
library(dplyr)
library(pander)
library(tidyr)

m1 = glm(mpg ~ wt + factor(cyl), data=mtcars)
m2 = glm(mpg ~ wt + factor(cyl) + hp + factor(am), data=mtcars)
base <- tidy(m1) %>% select(term, estimate) %>% mutate(type = "base_model")
with_am_hp <- tidy(m2) %>% select(term, estimate) %>% mutate(type = "Add_Horsepower_and_AM")
models <- bind_rows(base, with_am_hp)
formatted_models <- models  %>% spread(type, estimate)

m1_glance <- glance(m1) %>% mutate(type = "base_model")
m2_glance <- glance(m2) %>% mutate(type = "Add_Horsepower_and_AM")
glance_table <- data.frame("Add_Horsepower_and_AM" = unlist(glance(m2)), "base_model" = unlist(glance(m1))) %>% mutate(term = row.names(.))

full_results <- bind_rows(formatted_models, glance_table)
pandoc.table(full_results, justify = "left")
Maiasaura
  • 32,226
  • 27
  • 104
  • 108
1

This took quite a bit of wrangling, but I think it gets you close to what you want. I used xtable. The main idea is to create two columns for each model, one aligned right (coefficients) and the other aligned left (standard errors). So for a table with two models, we have five columns. Headers and the summary statistics are displayed in cells that span two columns.

First, we have header.tex, drawing on p. 27 of the xtable vignette:

\usepackage{array}
\usepackage{tabularx}
\newcolumntype{L}[1]{>{\raggedright\let\newline\\
\arraybackslash\hspace{0pt}}m{#1}}
\newcolumntype{C}[1]{>{\centering\let\newline\\
\arraybackslash\hspace{0pt}}m{#1}}
\newcolumntype{R}[1]{>{\raggedleft\let\newline\\
\arraybackslash\hspace{0pt}}m{#1}}
\newcolumntype{P}[1]{>{\raggedright\tabularxbackslash}p{#1}}

The .Rmd file. I learnt about add.to.row from this answer.

---
title: "Regression Table"
author: "eipi10"
date: "August 15, 2016"
header-includes:
    - \usepackage{dcolumn}
output: 
  pdf_document:
    includes:
      in_header: header.tex
---

```{r, echo=FALSE, message=FALSE, results="asis"}
library(xtable)
library(broom)   

m1 = glm(mpg ~ wt + factor(cyl), data=mtcars)
m2 = glm(mpg ~ wt + factor(cyl) + hp + factor(am), data=mtcars)

p_val <- c(0, 0.001, 0.01, 0.05, 1)
stars <- sapply(3:0, function(x) paste0(rep("*", x), collapse=""))

make_tbl <- function(model) {
  coefs <- summary(model)$coefficients
  coef_col <- round(coefs[,1], 2)
  se_col <- round(coefs[,2], 2)
  star_col <- stars[findInterval(coefs[,4], p_val)]
  tbl <- data.frame(coef=coef_col)
  tbl$se <- sprintf("(%0.2f)%s", se_col, star_col)
  tbl
}

make_addtorow <- function(row.name, terms) {
  # xtable allows the addition of custom rows. This function
  # makes a row with a one column (which is used for the row
  # names for the model statistics), 
  # followed by two columns that each span two columns.
  paste0(row.name, 
  paste0('& \\multicolumn{2}{C{3cm}}{', 
         terms, 
         '}', 
        collapse=''), 
  '\\\\')
}

tbl1 <- make_tbl(m1)
tbl2 <- make_tbl(m2)
combo <- merge(tbl1, tbl2, by = "row.names", all = TRUE)[,-1]
rownames(combo) <- c("Intercept", "AM: 1", "Cyl: 6", "Cyl: 8", "Horsepower", "Weight")
sum_stats <- round(rbind(glance(m1), glance(m2)), 2)

addtorow <- list()
addtorow$pos <- list(0, 6, 6, 6, 6, 6)
addtorow$command <- c(
  make_addtorow("", c("Base model", "Add Horsepower and AM")),
  make_addtorow("\\hline AIC", sum_stats$AIC), # Draw a line after coefficients
  make_addtorow("BIC", sum_stats$BIC),
  make_addtorow("Log Likelihood", sum_stats$logLik),
  make_addtorow("Deviance", sum_stats$deviance),
  make_addtorow("Num. obs.", sum_stats$df.null + 1)
  )

xtbl <- xtable(combo, add.to.row = addtorow, include.colnames = FALSE,  
               comment = FALSE)
# Specify column alignment for tabularx environment
# We're using the custom column types we created in header.tex
# \hskip specifies the width between columns
align(xtbl) <- c("L{2.5cm}", "R{1.5cm}@{\\hskip 0.1cm}", "L{1.5cm}", 
                           "R{1.5cm}@{\\hskip 0.1cm}","L{1.5cm}")

print(xtbl, 
      tabular.environment = "tabularx", # tabularx takes two arguments
      width = ".60\\textwidth",         # width, and alignment (specified above)
      add.to.row = addtorow, 
      include.colnames = FALSE,
      comment = FALSE)
```

enter image description here

Community
  • 1
  • 1
Weihuang Wong
  • 12,868
  • 2
  • 27
  • 48
  • `This took quite a bit of wrangling`...I was afraid it would be something like that. Thanks for your answer. For the latex-challenged among us (including me) can you add some text explaining the logic behind all the latex markup and how it all fits together in the `xtable` code? – eipi10 Aug 16 '16 at 15:34
  • Any section in particular that's unclear? The `header.tex` bit I took right out of the `xtable` vignette, and to be honest I'm not quite sure what's going on there myself, so I'm open to being enlightened! I'll add some comments to the code. – Weihuang Wong Aug 16 '16 at 15:39
  • It's not a matter of clarity. I was just hoping for some general comments on the logic behind the what you did. I'll work through the code as I try to implement this in my actual regression table and come back with any specific questions. – eipi10 Aug 16 '16 at 15:43
  • One feature of `tabularx` column types is that they wrap their contents given the width of the column (I edited the answer to reflect this). So this may take care of your line break issue too. – Weihuang Wong Aug 16 '16 at 16:29
  • Thanks for your work on this. I was able to implement it in my actual use case and the table looks good. – eipi10 Aug 19 '16 at 14:52
  • You're welcome! It's a setup that I can now use for my own work too. – Weihuang Wong Aug 19 '16 at 14:54