76

Wanted

A command line HTML5 beautifier running under Linux.

Input

Garbled, ugly HTML5 code. Possibly the result of multiple templates. You don't love it, it doesn't love you.

Output

Pure beauty. The code is nicely indented, has enough line breaks, cares for it's whitespace. Rather than viewing it in a webbrowser, you would like to display the code on your website directly.

Suspects

  • tidy does too much (heck, it alters my doctype!), and it doesn't work well with HTML5. Maybe there is a way to make it cooperate and not alter anything?
  • vim does too little. It only indents. I want the program to add and remove line breaks, and to play with the whitespace inside of tags.

DEAD OR ALIVE!

Community
  • 1
  • 1
blinry
  • 4,746
  • 4
  • 26
  • 31
  • Shouldn't this be a superuser question? – Jonno_FTW Apr 17 '10 at 08:23
  • 16
    I'd say you have the right site for this. Not sure how many people on SU actually use HTML, much less HTML5. – Tim Post Apr 30 '10 at 14:17
  • 1
    I had the same problem and ended up to write a new Ruby library that doesn't require compiling of any third party utils (I had problems to get Tidy working with Rails) and focuses just on HTML5, not XML, XHTML or HTML 4. It's not perfect yet, but has worked well in all projects I have used it. Please take a look at http://jarijokinen.com/html5-beautifier – Jari Jokinen Sep 02 '12 at 12:38
  • 2
    use XHTML5 and you can do `xmllint --format` – Janus Troelsen Jun 25 '13 at 12:16
  • you can also monkeypatch HTML5 polyglot documents: `echo ' '; (echo ""; tail -n +2 < index.html) | xmllint --format - | sed -re 's/( – Janus Troelsen Jun 25 '13 at 13:13

4 Answers4

27

HTML Tidy has been forked by the w3c and now has support for HTML5 validation.

https://github.com/w3c/tidy-html5

mhansen
  • 1,102
  • 1
  • 12
  • 18
19

I suspect tidy can be made to work with the right command-line parameters.

http://tidy.sourceforge.net/docs/quickref.html

You can specify an arbitrary doctype and add new block, inline, and empty tags, and turn on and off lots of tidy's cleaning options.

Depending on what you want it to "beautify" you can probably get decent results. It probably won't be able to do some of the more advanced things like rewriting the html content to eliminate spurious elements or combining them, if it doesn't recognize them.

Mr. Shiny and New 安宇
  • 13,822
  • 6
  • 44
  • 64
  • 14
    At a rough guess, how about `tidy -as-xhtml --input-xml --tidy-mark no -indent --indent-spaces 4 -wrap 0 --new-blocklevel-tags article,header,footer --new-inline-tags video,audio,canvas,ruby,rt,rp --doctype " " --break-before-br yes --sort-attributes alpha --vertical-space yes ` (disclaimer - I've not used html5, and I've only copied a few new tags from http://www.w3schools.com/html5/html5_reference.asp into the list by guessing which were block/inline, so please adjust as appropriate.) – Stobor May 06 '10 at 00:56
  • This seems to be the best option. Kudos to Stobor, too! – blinry May 06 '10 at 20:45
  • This is a good start, but it needs so much more. E.g. new input element attributes / values (type="date"). – dave1010 Jan 20 '11 at 11:04
  • 2
    i had trouble with 2 of the options here. `--doctype " "` and `--sort-attributes alpha` would not work for some reason – Ankur Aug 03 '11 at 02:08
  • I also struggled to get tidy working. My resulting options on ubuntu 14.10 were: tidy --tidy-mark no -indent --indent-spaces 4 -wrap 0 --new-blocklevel-tags 'article,header,footer' --new-inline-tags 'video,audio,canvas,ruby,rt,rp' --break-before-br yes --sort-attributes alpha --vertical-space yes – aaaaaa Jan 27 '15 at 11:58
9

Copied from a live website I did using HTML5 that is validated as proper HTML5 on all pages thanks to this snippet (PHP in this case but the options and logic is the same for any language used):

    $options = array(
        'hide-comments' => true,
        'tidy-mark' => false,
        'indent' => true,
        'indent-spaces' => 4,
        'new-blocklevel-tags' => 'article,header,footer,section,nav',
        'new-inline-tags' => 'video,audio,canvas,ruby,rt,rp',
        'new-empty-tags' => 'source',
        'doctype' => '<!DOCTYPE HTML>',
        'sort-attributes' => 'alpha',
        'vertical-space' => false,
        'output-xhtml' => true,
        'wrap' => 180,
        'wrap-attributes' => false,
        'break-before-br' => false,
    );

    $buffer = tidy_parse_string($buffer, $options, 'utf8');
    tidy_clean_repair($buffer);
    // Fix a tidy doctype bug
    $buffer = str_replace('<html lang="en" xmlns="http://www.w3.org/1999/xhtml">', '<!DOCTYPE HTML>', $buffer);
Philipp
  • 580
  • 5
  • 9
2

If you use Haml as your nanoc-filter, your html will automatically be pretty-printed. You can set html5 output as an option.