Converting or printing all sections of rendered quarto document into html in one go

Question

I want to convert Shiny for Python document into pdf. Jumping to each section and then printing into pdf is possible. However, wondering if there is a more compact way to print all sections in a one go.

Do you want to print one time? Do you want to do something like add a button and be able to print on demand? If one time, do you want to use the console in developer tools? — Kat, Dec 30 '22 at 01:42
I'm not sure I understand what you're going for. in any case, you could `document.querySelectorAll("a[href*=https]").forEach(x=> (console.log(x.href)))` to get all URLs (not recursively, obviously), then go to every url with selenium, wait for the element to load (whatever way you want), and [screenshot](https://www.geeksforgeeks.org/screenshot-element-method-selenium-python/) or [save as pdf](https://stackoverflow.com/questions/56897041/how-to-save-opened-page-as-pdf-in-selenium-python). perhaps remove unwanted elements before doing that. — Yarin_007, Dec 30 '22 at 17:49

score 1 · Accepted Answer · answered Jan 01 '23 at 20:19

I can propose a solution based on wkhtmltopdf and python (to scrape the links of html files for different sections of the docs and pass them to pdfkit, a python library which is a wrapper for wkhtmltopdf utility to convert HTML to PDF.

So at first download the wkhtmltopdf and then install this tool on your system (you may read this to get help about installation process and if you are a windows user remember to add wkhtmltopdf to PATH).

Then check its availability from cmd/shell by,

$ wkhtmltopdf --version

# wkhtmltopdf 0.12.6 (with patched qt)

Now then install these python libraries (assuming you have python installed),

pip install requests beautifulsoup4 pdfkit

and then run this python script,

$ python html2pdf.py

html2pdf.py


import re
import pdfkit
import requests
from bs4 import BeautifulSoup

# Making a GET request
r = requests.get('https://shiny.rstudio.com/py/docs/get-started.html')

# print(r.status_code)
  
# Parsing the HTML
soup = BeautifulSoup(r.content, 'html.parser')
a = soup.find_all('a', class_='sidebar-link')

# get the links
links = [link.get('href') for link in a if link.get('href') is not None]
site_link = 'https://shiny.rstudio.com/py'
full_links = [site_link + link[2:] for link in links]

# for file names
names = [re.findall("(?:.+\/)(.+)(?:.html)", link)[0] for link in full_links] 

# convert the link of htmls to pdf
for i, link in enumerate(full_links):
    pdfkit.from_url(link, f"{names[i]}.pdf")

It will convert all the html files (links in the sidebar of https://shiny.rstudio.com/py/docs/) into pdf files in one go.

$ ls

get-started.pdf            reactive-programming.pdf  ui-navigation.pdf
html2pdf.py                reactive-values.pdf       ui-page-layouts.pdf
overview.pdf               running-debugging.pdf     ui-static.pdf
putting-it-together.pdf    server.pdf                user-interface.pdf
reactive-calculations.pdf  ui-dynamic.pdf            workflow-modules.pdf
reactive-events.pdf        ui-feedback.pdf           workflow-server.pdf
reactive-mutable.pdf       ui-html.pdf

(+1) Thanks @shafee for very useful answer. The given script prints all sections, however, the print quality is not good. — MYaseen208, Jan 02 '23 at 01:24
Actually that would be the problem with automated printing. and i think even when you try to print those manually you dont have the full control over the quality. It depends on how the website author arranged things. — shafee, Jan 02 '23 at 01:28
On a second thought, assuming the html files are rendered from qmd files (my guess), a script can be writtern to grab those qmd files and render them to pdf using quarto. But unfortunately I am having trouble to find the github repo for these qmd source file. — shafee, Jan 02 '23 at 04:17

Converting or printing all sections of rendered quarto document into html in one go

1 Answers1