3

Could please anybody recommend libraries that are able to do the opposite thing than these libraries ?

HtmlCleaner, TagSoup, HtmlParser, HtmlUnit, jSoup, jTidy, nekoHtml, WebHarvest or Jericho.

I need to build html pages, build the DOM model from String content.

EDIT: I need it for testing purposes. I have various types of input/strings that might be in the html page on various places... So I need to dynamically build it up... I then process the html page based on various criterions that must be fulfilled or not.

I will show you why I asked this question, consider htmlCleaner for this job :

List<String> paragraphs = getParagraphs(entity.getFile());
List<TagNode> pNodes = new ArrayList<TagNode>();

TagNode html = cleaner.clean("<html/>");
for(String paragraph : paragraphs) {                
    TagNode p = new TagNode("p");
    pNodes.add(p);
    // CANNOT setText() ?
}
html.addChildren(pNodes);

The problem is that TagNode has getText() method, but no setText() method ....

Please add more comments about how vague this question is ... The best thing you can do

lisak
  • 21,611
  • 40
  • 152
  • 243
  • 1
    The opposite of those libraries? Isn't that kind of a vague question? – Stephane Gosselin May 31 '11 at 20:03
  • Nope it isn't, the primary goal of these libraries is parsing html pages and creation of DOM representation. I need exactly the opposite thing. I have to build them, build up the dom model and create a file from them... – lisak May 31 '11 at 20:25

4 Answers4

9

Jsoup, Jsoup, Jsoup! I've used all of those, and it's my favorite by a long shot. You can use it to build documents, plus it brings a lot of the magic of Jquery-style traversing alongside the best HTML document parsing I've seen to date in a Java library. I'm so happy with it that I don't mind shamelessly promoting it. ;)

stevevls
  • 10,675
  • 1
  • 45
  • 50
  • 2
    Well, it is hard to figure out how to do that with these libraries, cause they are meant for the opposite thing...for instance I can't find any class in HtmlCleaner or tagSoup that creates html page that I can't build up – lisak May 31 '11 at 20:01
  • Can you explain a little more about what you're trying to do, then? Jsoup can build up documents, too. I'm currently using it to parse and do extensive modification of HTML docs. – stevevls May 31 '11 at 20:03
  • He said he wanted the opposite of JSoup, still wondering what that would be. – Stephane Gosselin May 31 '11 at 20:03
  • JSoup is for parsing html documents - getting their dom representation ... I want to create html documents, build them up ... – lisak May 31 '11 at 20:05
  • Ha. It sounds like he's building up docs from scratch, which most of those libs will do... – stevevls May 31 '11 at 20:05
  • 2
    Start with this: `Element elem = Jsoup.parse("");` And then you have a very rich API for building up your document. – stevevls May 31 '11 at 20:06
  • That's exactly what I'm talking about. In none of them you can create an empty document and build it up... You have to do things like Element elem = Jsoup.parse(""); – lisak May 31 '11 at 20:08
  • Um...if you're creating an HTML doc, won't you have to start by adding an HTML element? I think you're confused, dude. – stevevls May 31 '11 at 20:11
  • 1
    Really, is Element elem = Jsoup.parse(""); that much more awful than DocThing thing = DocThingMaker.newEmptyDocumentThing(); ? – Will Hartung May 31 '11 at 20:39
  • It isn't but, you might get into situation that it doesn't have "setText()"; method after all, because the library is not meant for that – lisak May 31 '11 at 20:45
  • I picked the answer because it was helpful, nice solution and because I like htmlCleaner more than jsoup (more perspective api). I just wasn't aware of the ContentNode it has for nodes content representation. Thanks for your answer anyway. voting up – lisak May 31 '11 at 23:24
2

There are lot of template libraries for Java, from JSP to FreeMarker, from specific implementations in various frameworks (Spring?) to generic libraries like StringTemplate.

The most difficult task is... to make a choice.

In general, these libraries offer to make a skeleton of Web page, with "holes" to fill with variables. It is the simplest approach, often working well with tools.
If you really want to build from Dom, you can just use an XML library and generate XHTML.

PhiLho
  • 40,535
  • 6
  • 96
  • 134
2

If you are interested in HtmlCleaner particularly, it is actually a very convenient choice for building html documents.

But you must know that if you want to set content to a TagNode, you append a child ContentNode element :-)

List<String> paragraphs = getParagraphs(entity.getFile());
List<TagNode> pNodes = new ArrayList<TagNode>();

TagNode html = new TagNode("html");
for(String paragraph : paragraphs) {                
    TagNode p = new TagNode("p");
    p.addChild(new ContentNode(paragraph));
    pNodes.add(p);
}
html.addChildren(pNodes);
lisak
  • 21,611
  • 40
  • 152
  • 243
0

jwebutils -- A library for creating HTML 5 markup using Java. It also contains support for creating JSON and CSS 3 markup.

Jakarta Element Construction Set (ECS) - A Java API for generating elements for various markup languages it directly supports HTML 4.0 and XML. Now retired, but some folks really like it.

Basil Bourque
  • 303,325
  • 100
  • 852
  • 1,154