Questions tagged [jtidy]

JTidy is a Java port of HTML Tidy, a HTML syntax checker and pretty printer. JTidy can be used as a tool for cleaning up malformed and faulty HTML. In addition, JTidy provides a DOM interface to the document that is being processed, which effectively makes you able to use JTidy as a DOM parser for real-world HTML. JTidy is maintained by a group of volunteers.

JTidy is a Java port of HTML Tidy, a HTML syntax checker and pretty printer. Like its non-Java cousin, JTidy can be used as a tool for cleaning up malformed and faulty HTML. In addition, JTidy provides a DOM interface to the document that is being processed, which effectively makes you able to use JTidy as a DOM parser for real-world HTML.

JTidy was written by Andy Quick, who later stepped down from the maintainer position. Now JTidy is maintained by a group of volunteers.

Official Website: http://jtidy.sourceforge.net/

Useful Links:

97 questions
13
votes
2 answers

how to fetch base url from the given url using java

I am trying to fetch base URL using java. I have used jtidy parser in my code to fetch the title. I am getting the title properly using jtidy, but I am not getting the base url from the given URL. I have some URL as input: String s1 =…
DJ31
  • 1,219
  • 3
  • 14
  • 19
10
votes
2 answers

jTidy pretty print custom HTML tag

I'm trying to use JTidy to pretty print a well formed HTML generated by the user:
nanndoj
  • 6,580
  • 7
  • 30
  • 42
7
votes
4 answers

How do I make JTIdy make HTML documents well-formed?

I'm using JTidy v. r938. I'm using this code to attempt to clean up a page … final Tidy tidy = new Tidy(); tidy.setQuiet(false); tidy.setShowWarnings(true); tidy.setShowErrors(0); tidy.setMakeClean(true); Document document =…
Dave
  • 15,639
  • 133
  • 442
  • 830
6
votes
1 answer

jTidy and TagSoup documentation

I'm looking for documentation (officially documentation if it is possible) for TagSoup and jTidy libraries. I want use this libraries to manipulate html "tagsoup" files that include xml tags with different namespaces mixed between html (html, xhtml…
angelcervera
  • 3,699
  • 1
  • 40
  • 68
5
votes
2 answers

jTidy returns nothing after tidying HTML

I have come across a very annoying problem when using jTidy (on Android). I have found jTidy works on every HTML Document I have tested it against, except the following:
Henry Thompson
  • 2,441
  • 3
  • 23
  • 31
5
votes
3 answers

Proper usage of JTidy to purify HTML

I am trying to use JTidy (jtidy-r938.jar) to sanitize an input HTML string, but I seem to have problems getting the default settings right. Often strings such as "hello world" end up as "helloworld" after tidying. I wanted to show what I'm doing…
ragebiswas
  • 3,818
  • 9
  • 38
  • 39
4
votes
2 answers

How to add new tags to JTidy?

I am trying to use jTidy for extract data from (real world)HTML.But jTidy doesnt parse custom tags. some text more text I cant get texts between…
MuhammetK
  • 115
  • 1
  • 7
4
votes
3 answers

Pretty print ("indentation-only") HTML documents in Java (no JTidy)

We're generating HTML files out of apaches velocity generic template engine. The generated HTML is kind of ugly and not with correcht indentation. In my case I've got the HTML stored in a String which I want to manipulate in this way, that it looks…
Martin
  • 41
  • 1
  • 2
4
votes
5 answers

how to remove the warnings in Jtidy in java

I am using Jtidy parser in java. URL url = new URL("www.yahoo.com"); HttpURLConnection conn = (HttpURLConnection) url.openConnection(); InputStream in = conn.getInputStream(); doc = new Tidy().parseDOM(in, null); when I run this, "doc = new…
DJ31
  • 1,219
  • 3
  • 14
  • 19
4
votes
2 answers

JTidy Node.findBody() — How to use?

I'm trying to do XHTML DOM parsing with JTidy, and it seems to be rather counterintuitive task. In particular, there's a method to parse HTML: Node Tidy.parse(Reader, Writer) And to get the of that Node, I assume, I should use Node…
ansgri
  • 2,126
  • 5
  • 25
  • 37
4
votes
1 answer

Installing a feature in servicemix

I am running Apache servicemix 4.5.2. I want to install a feature, i.e. a jar file. The feature I wanted is jtidy. The pom dependence is: jtidy jtidy
Luixv
  • 8,590
  • 21
  • 84
  • 121
4
votes
2 answers

How to best use JTidy with a Spring servlet container?

I have a Java servlet container using the Spring Framework. Pages are generated from JSPs using Spring to wire everything up. The resulting HTML sent to the user isn't as, well, tidy as I'd like. I'd like to send the HTML to Tidy right before…
Dean J
  • 39,360
  • 16
  • 67
  • 93
3
votes
2 answers

Parsing DOM returned from JTidy to find a particular HTML element

I have been playing with this code for a while, and I am not certain what I am doing wrong. I get a url, clean it up with JTidy, as it isn't well-formed, then I need to find a particular hidden input field (input type="hidden" name="mytarget"…
James Black
  • 41,583
  • 10
  • 86
  • 166
3
votes
1 answer

Parsing HTML on Android, major performance issues

I need to parse about 100 kB of HTML data and this simply causes huge performance issues on Android. I've tried both the built-in XML parser and JTidy. The built-in XML parser gives me a parsing time of about half a second, which I can easily live…
Overv
  • 8,433
  • 2
  • 40
  • 70
3
votes
1 answer

How to clean up an XML file for Java parsing by putting quotes around attributes

I have a series of xml files that looks something like this: Some text here More text ... I'm trying to parse the xml using the standard DOM way, but because the attribute values for P are…
neptune
  • 1,380
  • 2
  • 17
  • 25
1
2 3 4 5 6 7