`str_replace_all()` on html output (from `huxtable()`)

Question

My R code generates some html output which I'd like to make two very simple "find and replace" type adjustments to:

instead of R2 in the html, I'd like to replace with R<sup>2</sup>
intead of [number] *** in the html, I'd like to replace with [number]<sup>***</sup>, i.e removing the space and adding superscript.

I've been trying to do this with str_replace_all(). If I can solve my problem within the tidyverse that would be excellent.

For a reproducible example, I'll use mtcars to generate the html from huxtable::huxreg(), which is the same function that generates output in my real-life problem.

library(huxtable)
library(tidytext)

fit1 <- lm(mpg ~ disp, data = mtcars)

huxreg(fit1) %>% 
  quick_html()

which gives output that is the html version of this:

─────────────────────────────────────────────────
                                   (1)           
                        ─────────────────────────
  (Intercept)                        29.600 ***  
                                     (1.230)     
  disp                               -0.041 ***  
                                     (0.005)     
                        ─────────────────────────
  N                                  32          
  R2                                  0.718      
  logLik                            -82.105      
  AIC                               170.209      
─────────────────────────────────────────────────
  *** p < 0.001; ** p < 0.01; * p < 0.05.        

Column names: names, model1

So I tried to str_replace_all() on the R2 and the ***, but my output seems unchanged. Is there a simple way for me to make this replacement?

huxreg(fit1) %>% 
  quick_html() %>% 
  str_replace_all(pattern = "R2", replacement = "R<sup>2</sup>") %>% 
  str_replace_all(pattern = " ***", replacement = "<sup>***</sup>")

alistaire · Accepted Answer · 2020-06-09T04:18:48.407

quick_html() returns NULL, not the text of the HTML it produces, which it saves to a file (huxtable-output.html, by default). You can read that file back in and run regex on it:

library(huxtable)
library(stringr)

fit1 <- lm(mpg ~ disp, data = mtcars)
filepath <- 'huxtable-output.html'

huxreg(fit1) %>% 
    quick_html(file = filepath, open = FALSE)

readLines(filepath) %>% 
    str_replace_all(pattern = "R2", replacement = "R<sup>2</sup>") %>% 
    str_replace_all(pattern = fixed(" ***"), replacement = "<sup>***</sup>") %>% 
    writeLines(filepath)

# open file in browser
browseURL(filepath)

Or as @27ϕ9 mentions in the comment below, you can use huxtable::to_html() to avoid the reading back in:

huxreg(fit1) %>% 
    to_html() %>%
    str_replace_all(pattern = "R2", replacement = "R<sup>2</sup>") %>% 
    str_replace_all(pattern = fixed(" ***"), replacement = "<sup>***</sup>") %>% 
    writeLines(filepath)

Maybe better not to parse HTML with regex, though. Check out rvest and xml2 for more robust tooling designed for the purpose.

The current solution you posted is incredibly helpful. I will see if I can come up with an `xml2` solution, and if I can, I'll post it here for anyone who is looking at this question in the future. — Jeremy K., Jun 09 '20 at 06:46

score 1 · Answer 2 · answered Jun 11 '20 at 07:14

Let's keep it simple:

h <- huxreg(fit1)
h[7, 1] <- "R<sup>2</sup>"
escape_contents(h)[7, 1] <- FALSE

h <- map_contents(h, by_regex("***" = "<sup>***</sup>", 
      .grepl_args = list(fixed = TRUE)))
h <- map_escape_contents(h, by_regex("***" = FALSE, 
       .grepl_args=list(fixed = TRUE)))

quick_html(h)

`str_replace_all()` on html output (from `huxtable()`)

2 Answers2