1

I have HTML files that are created automatically, I also want to save them as PDF files. One way to it is to open them in firefox and print them to file as a pdf file. But I can't do this by hand each time, I need this to be scripted.

Is there a way to script firefox in linux to:

  • Open a HTML file.

  • Remove all the "extra data" from the print page layout (time, URL, page# and any other data that may be added to a printed page by default)

  • Print the file with a specified name (The same name as the html file, only as a .pdf)

  • close Firefox

And all this in batch mode, or by redirecting DISPLAY to NULL.

Other programs that convert html to pdf, won't work unless they are native to linux. (Come with most distributions).

SIMEL
  • 8,745
  • 28
  • 84
  • 130
  • possible duplicate of [How can I automate HTML-to-PDF conversions?](http://stackoverflow.com/questions/176476/how-can-i-automate-html-to-pdf-conversions) –  Jun 02 '13 at 12:52
  • That answer uses programs that need an installation/download are are not supplied with linux distributions. So they are unavailable to me. – SIMEL Jun 02 '13 at 12:57

1 Answers1

0

You might want to do a quick-and-dirty Python script with pycurl, which will accept an URL as a parameter, then it should download the needed file and the assets it uses (CSS etc ...), and use pandoc to convert the HTML to PDF

That way, you can say bye to the "rubbish" Firefox adds, and what's more, pandoc should be in the repositories of your distro. pandoc is a CLI tool as well as an Haskell API, as it is written in Haskell

Mathuin
  • 790
  • 1
  • 6
  • 13
  • The html files, are files that I create, I don't need to download them. Also, there are no extra assets except for the html file itself. It's simply a table. – SIMEL Jun 02 '13 at 15:56
  • By pandoc, do you mean [this](https://pypi.python.org/pypi/pyandoc/) package, or something else. Please provide a link in your description. – SIMEL Jun 02 '13 at 15:58
  • pandoc is a CLI tool and a Haskell API, the package you point to is a Python wrapper for pandoc. Yes, it'll work, as long as `pandoc` is installed too. Yet it seems the Python wrapper is not complete, I'd use the CLI `pandoc` and call it via `system()` or whatever is better than this function, which is known to be insecure. – Mathuin Jun 05 '13 at 11:14
  • pandoc uses different latex engines to to render, and no one of them as good as firefox. The best of those I found was `prince` but it is shareware and free version adds their logo to dpf. The other tools are html2pdf, weasyprint, wkthml2pdf, and others. But personally I looking for way to do rendering via Firefox. It gives the best pages. – kyb Feb 15 '22 at 21:23