42

Is it possible to use HTML Tidy to just indent HTML code?

Sample Code

<form action="?" method="get" accept-charset="utf-8">

<ul>
<li>
<label class="screenReader" for="q">Keywords</label><input type="text" name="q" value="" id="q" />
</li>
<li><input class="submit" type="submit" value="Search" /></li>
</ul>


</form>

Desired Result

<form action="?" method="get" accept-charset="utf-8">
    <ul>
        <li>
        <label class="screenReader" for="q">Keywords</label><input type="text" name="q" value="" id="q"/>
        </li>
        <li><input class="submit" type="submit" value="Search"/></li>
    </ul>
</form>

If I run it with the standard command, tidy -f errs.txt -m index.html then I get this

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN">
<html>
<head>
<meta name="generator" content=
"HTML Tidy for Mac OS X (vers 31 October 2006 - Apple Inc. build 15.3.6), see www.w3.org">
<title></title>
</head>
<body>
<form action="?" method="get" accept-charset="utf-8">
<ul>
<li><label class="screenReader" for=
"q">Keywords</label><input type="text" name="q" value="" id=
"q"></li>
<li><input class="submit" type="submit" value="Search"></li>
</ul>
</form>
</body>
</html>

How can I omit all the extra stuff and actually get it to indent the code?

Forgive me if that's not a feature that it's supposed to support, what library / tool am I looking for?

JasonMArcher
  • 14,195
  • 22
  • 56
  • 52
cwd
  • 53,018
  • 53
  • 161
  • 198
  • 1
    Other folks have suggested that http://prettydiff.com/?m=beautify&html might be a better option if you *just* want to indent. – Amanda Mar 03 '14 at 15:55

7 Answers7

31

Use the indent, tidy-mark, and quiet options:

tidy \
  -indent \
  --indent-spaces 2 \
  -quiet \
  --tidy-mark no \
  index.html

Or, using a config file rather than command-line options:

indent: auto
indent-spaces: 2
quiet: yes
tidy-mark: no

Name it tidy_config.txt and save it the same directory as the .html file. Run it like this:

tidy -config tidy_config.txt index.html

For more customization, use the tidy man page to find other relevant options such as markup: no or force-output: yes.

Rob Bednark
  • 25,981
  • 23
  • 80
  • 125
Paul Sweatte
  • 24,148
  • 7
  • 127
  • 265
  • 9
    This does not answer the question. It is still adding a meta generator tag. Is there a way to turn off *all* changes except indentation? – Jason Clark Aug 08 '12 at 19:33
  • 1
    Use the [tidy man page](http://tidy.sourceforge.net/docs/tidy_man.html) to reference and test the flags. Try turning off defaults by adding `markup: no` or `input-xml: yes` and `force-output: yes` to the config file. – Paul Sweatte Aug 09 '12 at 05:59
  • 8
    If you are a fan of oneliners without intermediate files, you can write the same as `tidy -xml --indent auto --indent-spaces 2 --quiet yes index.html`. – tlwhitec May 30 '14 at 12:34
  • 3
    `tidy-mark: no` should turn off the meta generator tag, – Dave Burton Jan 07 '16 at 16:29
  • @DaveBurton Good catch. Thanks! – Paul Sweatte Jan 07 '16 at 21:42
  • 1
    Leave out `input-xml: yes` (`-xml`) since it will flag `` as an error. – thdoan Jan 19 '16 at 08:33
  • If anybody's having issues passing options from command-line, make sure you're not separating the value from the option with an equals-sign (`--option=value`). It requires a space to be used (that was where I was going wrong - thanks @tlwhitec!) –  Mar 21 '16 at 17:18
  • 1
    This is still adding missing tags - such as `` to my (php) document. I can't find anything in the man page for strict indent only without adding tags. I am hoping to format included documents - title tags are included from another file. – moo Apr 12 '20 at 10:40
  • Looks like the magic option needed to omit the `DOCTYPE`, `html`, `head`, and `title` tags is `--show-body-only yes`. It's a shame this is hard to find in the docs. The words "title" and "snippet" do not appear, however the docs do mention "fragment" when describing the `--show-body-only` option. Hope this helps someone. – Rotsiser Mho May 10 '23 at 13:43
25

I didn't found a possibility "only reindent - without any changes". The next config file will "repair" as low as possible and (mostly) only re-indent the html. Tidy still correcting some errorish conditions, like duplicated (repeated) attributes.

#based on http://tidy.sourceforge.net/docs/quickref.html
#HTML, XHTML, XML Options Reference
anchor-as-name: no  #?
doctype: omit
drop-empty-paras: no
fix-backslash: no
fix-bad-comments: no
fix-uri:no
hide-endtags: yes   #?
#input-xml: yes     #?
join-styles: no
literal-attributes: yes
lower-literals: no
merge-divs: no
merge-spans: no
output-html: yes
preserve-entities: yes
quote-ampersand: no
quote-nbsp: no
show-body-only: auto

#Diagnostics Options Reference
show-errors: 0
show-warnings: 0

#Pretty Print Options Reference
break-before-br: yes
indent: yes
indent-attributes: no   #default
indent-spaces: 4
tab-size: 4
wrap: 132
wrap-asp: no
wrap-jste: no
wrap-php: no
wrap-sections: no

#Character Encoding Options Reference
char-encoding: utf8

#Miscellaneous Options Reference
force-output: yes
quiet: yes
tidy-mark: no

For example the next html-fragment

<div>
<div>
<p>
not closed para
<h1>
h1 head
</h1>
<ul>
<li>not closed li
<li>closed li</li>
</ul>
some text
</div>
</div>

will changed to

<div>
    <div>
        <p>
            not closed para
        <h1>
            h1 head
        </h1>
        <ul>
            <li>not closed li
            <li>closed li
            </ul>some text
    </div>
</div>

As you can notice, the hide-endtags: yes hides the closing </li> from the second bullet in the input. Setting the hide-endtags: no - will get the next:

<div>
    <div>
        <p>
            not closed para
        </p>
        <h1>
            h1 head
        </h1>
        <ul>
            <li>not closed li
            </li>
            <li>closed li
            </li>
        </ul>some text
    </div>
</div>

so, tidy adds closing </p> and closing </li> to first bullet.

I didn't found a possibility preserve everything on input and only reindent the file.

kobame
  • 5,766
  • 3
  • 31
  • 62
  • Thanks for this. Used this in my PHP script and the html output looks great. Took me a while to copy the attributes so here's a pastebin for anyone looking to config tidy like this: https://pastebin.com/JP8ucTzc – bbruman May 19 '17 at 19:20
  • hide-endtags: NO – Jaxx0rr Jul 17 '18 at 16:39
  • Thx for this explanation. But how to turn "some text" into "\nsometext"? And how to keep the empty lines in the source code? – Junwei WANG Apr 10 '21 at 05:56
17

You need the following option:

tidy --show-body-only yes -i 4 -w 80 -m file.html

http://tidy.sourceforge.net/docs/quickref.html#show-body-only

-i 4 - indents 4 spaces (EDIT: tidy never uses tabs)
or
--indent-with-tabs yes - instead (--tab-size may affect wrapping)

-w 80 - wrap at column 80 (default on my system: 68, very narrow)

-m - modify file inplace

(you may want to leave out the last option, and examine the output first)

Showing only body, will naturally leave out the tidy-mark (generator meta).

Another cool options are: --quiet yes - doesn't print W3C advertisements and other unnecessary output (errors still reported)

Tomasz Gandor
  • 8,235
  • 2
  • 60
  • 55
7

To answer the poster's original question, using Tidy to just indent HTML code, here's what I use:

tidy --indent auto --quiet yes --show-body-only auto --show-errors 0 --wrap 0 input.html

input.html

<form action="?" method="get" accept-charset="utf-8">

<ul>
<li>
<label class="screenReader" for="q">Keywords</label><input type="text" name="q" value="" id="q" />
</li>
<li><input class="submit" type="submit" value="Search" /></li>
</ul>


</form>

Output:

<form action="?" method="get" accept-charset="utf-8">
  <ul>
    <li><label class="screenReader" for="q">Keywords</label><input type="text" name="q" value="" id="q"></li>
    <li><input class="submit" type="submit" value="Search"></li>
  </ul>
</form>

No extra HTML code added. Errors are suppressed. To find out what each option does, it's best to refer to the official reference.

thdoan
  • 18,421
  • 1
  • 62
  • 57
  • In vim: `%!tidy --show-errors 0 --show-body-only auto -qi -w 0` – unagi Jan 03 '17 at 03:08
  • Unfortunately these options are not enough. tidy inserts a form tag around this code: `` But this one is fine: `` `tidy --version` outputs `HTML Tidy for Linux released on 25 March 2009` – unagi Jan 03 '17 at 03:27
2

I am very late to the party :)

But in your tidy config file set

tidy-mark: no

by default this is set to yes.

Once done, tidy will not add meta generator tag to your html.

  • 3
    My version of `tidy` (and probably any other) will accept configuration options also as command line options (which is sometimes more desirable than dragging around a config file), like: `tidy --tidy-mark no -utf8 -w 80 -i file.html`. – Tomasz Gandor Feb 19 '14 at 09:33
  • This does not prevent the generation of `DOCTYPE`, `html`, and `head` tags. – Rotsiser Mho May 10 '23 at 13:45
2

If you'd like to simply format whatever html you receive, ignore errors and indent the code nicely this is a good one liner using tidy

tidy --show-body-only yes -i 4 -w 80 -m -quiet --force-output y -wrap 0 2>/dev/null

You can use it with curl too

curl -s someUrl | tidy --show-body-only yes -i 4 -w 80 -m -quiet --force-output y -wrap 0 2>/dev/null
jasonleonhard
  • 12,047
  • 89
  • 66
  • Where does the input file go in the first command? – étale-cohomology Dec 28 '21 at 11:03
  • The second command starts with `curl -s someUrl |` and that `|` is a redirect to the rest of the command. So, you could curl some website and redirect it, or you could say `cat index.html | tidy --show-body-only yes -i 4 -w 80 -m -quiet --force-output y -wrap 0 2>/dev/null` for instance – jasonleonhard Dec 30 '21 at 20:02
0

None of the html tidy based solutions worked for me - all of them modified the content to some extent, so I create a CLI tool and Go package https://github.com/a-h/htmlformat based off https://github.com/ericchiang/pup

It uses the Go net/html package to parse the HTML, and a custom writer to write out the content with indentation.

a-h
  • 4,244
  • 2
  • 23
  • 29