0

I'm building a mini web browser that parses XHTML and Javascript using Javacc and Java, and I need to build the DOM. Is there any tool that can help me get the DOM and manipulate its nodes without having to build it manually as my browser parses the document?

user
  • 15
  • 5
  • For XML (xhtml), start [here](http://download.oracle.com/javase/1.5.0/docs/api/javax/xml/parsers/DocumentBuilderFactory.html) – khachik Dec 11 '10 at 15:57

2 Answers2

2

Try using JDOM or Dom4J or reading this question about XML parsers for Java

If you want to handle HTML as found in the wild, trying using JTidy, which will attempt to recover badly formatted HTML for you before rendering it to a DOM.

Community
  • 1
  • 1
Joel
  • 29,538
  • 35
  • 110
  • 138
0

I'm not sure why you think you need JavaCC to parse an XHTML document. If it's truly valid XHTML, then it's valid XML, and that means that any XML DOM parser will be able to deliver a DOM that you can manipulate. Why not just use the DOM parser that's built into Java or Xerces from Apache or JDOM or DOM4J? Writing your own using JavaCC can be a valuable learning exercise, but I doubt that it'll be better than what you already have available to you.

duffymo
  • 305,152
  • 44
  • 369
  • 561
  • It's an assignment in one of my classes (Compilers). – user Dec 11 '10 at 16:20
  • What's the assignment, to write an XML DOM parser using JavaCC? If yes, what's your question? Also, add a homework tag so people will know. – duffymo Dec 11 '10 at 16:52
  • The assignment is to write a mini web browser that parses and displays XHTML documents that include Javascript. I need to build the DOM so that the Javascript code can manipulate it. I wasn't sure what's the best way to tackle this but I'll try to use some library to get the DOM. – user Dec 12 '10 at 21:36