4

We're generating HTML files out of apaches velocity generic template engine. The generated HTML is kind of ugly and not with correcht indentation.

In my case I've got the HTML stored in a String which I want to manipulate in this way, that it looks pretty printed.

I've already gave JTidy a try, but it changes the HTML source code when I pipe the raw HTML trough it. Sometimes it adds or removes HTML tags.

My question:

Is there a java library or something else out there which (only!) pretty prints my HTML code without adding, removing tags from my HTML document? It shall only do the indentation, so that it looks pretty printed! Nothing more, nothing less. Any ideas? :-)

Also code suggestions, hints or tips are welcome.

Best regards

Martin
  • 41
  • 1
  • 2
  • @see http://stackoverflow.com/questions/996646/stand-alone-java-code-formatter-beautifier-pretty-printer – chepseskaf Jul 29 '11 at 09:48
  • *hm* On this site they are writing about standalone source code formatters for java source code. I'm looking for a java libray which will do HTML pretty printing, so that I have pretty printed HTML in my string variable ;-) – Martin Jul 29 '11 at 10:04
  • It seems there is no other alternative to JTidy... Have you played with configuration ? – chepseskaf Jul 29 '11 at 13:10
  • Possible duplicate of [Pretty HTML snippet output](http://stackoverflow.com/questions/29196699/pretty-html-snippet-output) – bluenote10 Apr 14 '17 at 16:17

3 Answers3

2

Maybe a little to late, but I found a solution to this with Jsoup.

you can get the "pretty" version of the html by using only the parser, and (in case of needed) avoid the generation of the html elements by using a "custom parser"

I got the answer from this Jsoup question

And its

public static String formatHTML(String html) throws Exception{ Document doc = Jsoup.parse(html, "", Parser.xmlParser()); return doc.toString(); }

I hope this helps.

Regards

Community
  • 1
  • 1
cesaregb
  • 765
  • 1
  • 11
  • 27
1

Find any SAX parser example in java. indent++ for opening tags, intent-- for closing, and write content with counted intentation.

Michał Šrajer
  • 30,364
  • 7
  • 62
  • 85
  • the trouble with a SAX parser is that it's very hard to make an otherwise identical copy of the source file. You need to reconstruct the elements and attributes - the characters are not simply echoed to the output stream. – Richard H Jan 03 '14 at 11:00
0

Why don't you write a simple Java parser to pretty print HTML yourself. Here is a sketch:

  1. Track open and close tags for example and
  2. have a counter to figure out the current indentation level.
  3. Perhaps use a stack to push, pop the indentation level
  4. Just iterate thru the HTML string and push the current indentation level on stack when you see a tag
  5. If you see a nested tag then increment indentation level and keep going
  6. When you see an end of tag e.g . etc then pop the stack to go back to prev indent level

I wanted to give you a rough idea here, you can use this as a starting point. I have written many perl based pretty printers. You could use Perl to script a parse fairly quickly..