1

I want to perform a regression analysis with R, using a difference contrast for a nominal independent variable. However the contrast produces factor level names that are not suitable for publication. So I want to change them. The problem is how to change them.

I first looked at the package labelled, but that did not fix the problem. That is, I use the function tbl_regression from the gtsummary package, and labelling did not change anything. Here is a sample code:

# create data
set.seed(345)
depvar <- rnorm(300,300,60) #baseline area
indepvar <- rep(c("A","B", "C"), times=100)
data <- data.frame(depvar, indepvar)

# set indep to factor
data$indepvar<- as.factor(data$indepvar)

# model without contrast
## model 1
m1 <- lm(depvar ~ indepvar, data = data)

## create table
library(gtsummary)
tbl_regression(m1)

# model with contrast
## create contrast 
library(MASS)
contrasts(data$indepvar) <- contr.sdif

## model 2
m2 <- lm(depvar ~ indepvar, data = data)

## create table
tbl_regression(m2)

I want to change indepvarB-A into something like B minus A. Below is some code to inspect the data.

## inspect data structure and attributes
head(data$indepvar)
str(data)
attributes(data$indepvar) 

Options: either add or change value labels, or change attributes. Or maybe there is a different/better way to create the contrast. Any advice how to is much appreciated.

2 Answers2

0

A quick and dirty solution would be this.

lm(depvar ~ B_minus_A_, data=transform(data, B_minus_A_=indepvar)) |>
  gtsummary::tbl_regression()

enter image description here

jay.sf
  • 60,139
  • 8
  • 53
  • 110
  • thanks, I don't mind quick and dirty, whatever works :-) Unfortunately, this is not exactly what I was looking for. Currently, the label is a concatenation of the variable name (indepvar) and the comparison set by the contrast based on the factor names (B-A). I would like to remove the variable name from value label entirely, and replace the default comparison based on the factor level names by my own value label. Thanks – maurice vergeer Jul 27 '23 at 14:39
  • @mauricevergeer I have no clue. You could try `..` instead of `B_minus_A_` but isn't a real beauty either. Hack the HTML directly, file a feature request at their [GitHub](https://github.com/ddsjoberg/gtsummary/issues) or use something different. Cheers! – jay.sf Jul 27 '23 at 16:17
  • thanks for your suggestions. yes, hacking html would be possible, but I want it as pdf. yes, I will try the official github page. again thanks – maurice vergeer Jul 27 '23 at 17:53
  • @mauricevergeer Aha, for .pdf you would need LaTeX. You can quickly websearch some knowledge together to create very nice custom tables, or look on https://tex.stackexchange.com/. Check [https://overleaf/](https://www.overleaf.com/), they might even have some templates. Definitely recommended are the dcolumn and siunitx packages, you will see that they are quoted a lot. For personal use I'm happiest with the console, for publishing only manual LaTeX. – jay.sf Jul 27 '23 at 18:05
  • thanks for the sources. Because I was borded creating documents manually, I started using quarto in rstudio. Would these sources help me? Because quarto renders/knits the document to a pdf? Alternatively - worst case scenario - I could use a pdf-editor to fix the issue. – maurice vergeer Jul 27 '23 at 18:18
  • @mauricevergeer overleaf is like having LaTeX without need to install it, all on-line, free for academic use, just download pdf if rendered satisfactorily. Coding LaTeX tables is not much harder than writing MathJax. And no more waiting for feature requests being answered ;) – jay.sf Jul 27 '23 at 18:25
0

Essentially R is doing behind the scenes is a form of one-hot encoding. You can do this yourself:

data$A <- rep(c(1,0,0), 100)
data$B <- rep(c(0,1,0), 100)
data$C <- rep(c(0, 0, 1), 100)
m2 <- lm(depvar ~ A+B+C, data = data)

In fact you can leave off the C. The regression coefficients are slightly different but functionally equivalent. That may look more like what you want?

Cal Lee
  • 50
  • 6
  • Thanks, but I think there is a musunderstanding. If I am correct your solution refers to creating dummy variables based on a categorical variable. What I want is purely aesthetic: changing the labels of contrast categories/levels – maurice vergeer Jul 27 '23 at 16:01
  • It's the same effect no? Now you've removed the whole B_minus_A part that looks ugly. – Cal Lee Jul 27 '23 at 22:53
  • No, the reference changes. see https://cran.r-project.org/web/packages/faux/vignettes/contrasts.html#difference . Furthermore, the variable name in the value label is redundant. That's why I want to remove it, and change the label to ease interpretation.. – maurice vergeer Jul 28 '23 at 15:24