148

I'm looking for recommendations for HTML pretty printers which fulfill the following requirements:

  • Takes HTML as input, and then output a nicely formatted/correctly indented but "graphically equivalent" version of the given input HTML.
  • Must support command-line operation.
  • Must be open-source and run under Linux.
knorv
  • 49,059
  • 74
  • 210
  • 294
  • 11
    Other options are `pup` (without arguments), `xmllint --format --html -`, and `xml fo --html`. – nisetama Sep 08 '16 at 19:25
  • 2
    curl https://httpbin.org/ | tidy -im – bajocode Aug 25 '17 at 07:18
  • 2
    Also: hxnormalize from html-xml-utils (Debian) – elig Jan 16 '19 at 05:20
  • related: https://stackoverflow.com/questions/16090869/how-to-pretty-print-xml-from-the-command-line you can also look into XML Tools – Alexander Oh Aug 12 '19 at 10:33
  • 5
    I do have problems to get why this is considered off-topic, honestly... – Victor Schröder Mar 07 '20 at 14:33
  • 1
    Hey @VictorSchröder I think this is one of the few closed questions that are closed because of genuine non-compliance with the topic rules. I don't agree with the topic rules. I think recommendations of tools should be a core component, as none of want to "reinvent the wheel", at least not every day, for everything, so recommendations of good tooling is core to busines, but given they have that clause, this post doesn't comply. There are plenty of questions that are closed for far more ridiculous interpretations, unfortunately. It does seem to be a disease amongst SO admins. :-( – NeilG May 22 '22 at 06:31

5 Answers5

123

Have a look at the HTML Tidy Project: http://www.html-tidy.org/

The granddaddy of HTML tools, with support for modern standards.

There used to be a fork called tidy-html5 which since became the official thing. Here is its GitHub repository.

Tidy is a console application for Mac OS X, Linux, Windows, UNIX, and more. It corrects and cleans up HTML and XML documents by fixing markup errors and upgrading legacy code to modern standards.

For your needs, here is the command line to call Tidy:

tidy inputfile.html
flying sheep
  • 8,475
  • 5
  • 56
  • 73
jonjbar
  • 3,896
  • 1
  • 25
  • 46
  • 18
    Thanks! "tidy -i -m -w 160 -ashtml -utf8 index.html" did the trick! Turns out tidy is installed by default in MacOS X - excellent! – knorv Feb 03 '10 at 20:07
  • 1
    Tidy was struggling with getting the indentation until I ran it with this option (rather than letting it default to "auto" with -i: tidy --indent yes – Edward Anderson Apr 05 '13 at 12:18
  • 3
    Tidy is great as a validator/lint tool, but it's not so great as a code beautifier. Two issues: (1) it can only operate on files, not standard input (so you cannot, for example, send selected text from Notepad++ to tidy.exe, and have it output the formatted code back to Notepad++); (2) It has trouble formatting a lot of code, e.g.: `
    `.
    – thdoan Jan 19 '16 at 09:27
  • 1
    Also it modifies the file when it cannot understand text. – Paweł Szczur Oct 23 '16 at 17:01
  • One note about tidy-html5, if you're using inline javascript, you need to include `type="text/javascript"` otherwise tidy will add `<![CDATA[` – jcubic Jan 21 '17 at 09:49
  • tidy index.html -qi -utf8 --output index.html just a command done all things. – Tejas Tank Apr 05 '18 at 06:33
  • Tidy does more than just format the HTML. It will **remove empty tags** and **reorder technically invalid HTML** that is accept by browsers (read: is used on the internet). `

    ` gets reordered as `

    ` and something like `

    ` just gets deleted. See [this GitHub issue](https://github.com/htacg/tidy-html5/issues/479). If you use tidy, you should run it in quiet mode `tidy -q` and don't ignore any warnings like `trimming empty

    `. Don't use it on HTML you didn't write.

    – Boris Verkhovskiy Nov 16 '19 at 01:27
  • @thdoan my version of `tidy` on Linux uses stdin, stdout and stderr if these are not specified in the options. I presume you are limited by your OS. – NeilG May 22 '22 at 06:40
13

Update 2018: The homebrew/dupes is now deprecated, tidy-html5 may be directly installed.

brew install tidy-html5

Original reply:

Tidy from OS X doesn't support HTML5. But there is experimental branch on Github which does.

To get it:

 brew tap homebrew/dupes
 brew install tidy --HEAD
 brew untap homebrew/dupes

That's it! Have fun!

Paul J
  • 1,489
  • 1
  • 17
  • 19
Paul Brit
  • 5,901
  • 4
  • 22
  • 23
  • 1
    `Error: No available formula with the name "tidy"`. `brew install tidy-html5` works. – Pysis Apr 04 '17 at 13:34
  • Indeed `brew install tidy-html5` works and you don't neeed the homebrew/dupes tap either. – Ogier Schelvis Nov 15 '17 at 10:33
  • Tidy does more than just format the HTML. It will **remove empty tags** and **reorder technically invalid HTML** that is accept by browsers (read: is used on the internet). `

    ` gets reordered as `

    ` and something like `

    ` just gets deleted. See [this GitHub issue](https://github.com/htacg/tidy-html5/issues/479). If you use tidy, you should run it in quiet mode `tidy -q` and don't ignore any warnings like `trimming empty

    `. Don't use it on HTML you didn't write.

    – Boris Verkhovskiy Nov 16 '19 at 01:27
5

I think HTML tidy is one of the household names in that field.

Pekka
  • 442,112
  • 142
  • 972
  • 1,088
5

To have an updated, OS-agnostic answer to this question:

While the original HTMLTidy project has been dormant for over 6 years, a "W3C Community & Business group" that goes by the name "HTML Tidy Advocacy Community Group (HTACG)" has now begun to continue its development, with the goal of making it fully HTML5-compatible. The group was formed in January 2015 and although they describe the current state as "work in progress", binaries are already available for download.

zb226
  • 9,586
  • 6
  • 49
  • 79
1

Just a late followup on an OT question.

Homebrew has a tidy-html5 installed as you'd expect.

It's linked up as tidy5.

Dave Newton
  • 158,873
  • 26
  • 254
  • 302
  • Tidy still mostly as HTML formatter & validator, not HTML parser. Which tool can be used for HTML parsing **based on rules**: search the code for target elements (tags) with specified 'class' or 'id', and delete them, along with content (child tags)? Plus delete specified tags. – Lexx Luxx Sep 03 '21 at 18:14
  • @triwo If you have a new question, particularly when not related to the original question, post a new question :) The caveat is that requests for tools/libraries/etc. are generally considered off-topic. In general, any HTML parser w/ XPath or CSS selector queries should be able to manipulate a DOM in arbitrary ways. – Dave Newton Sep 03 '21 at 18:25