0

I have a XML file with several <text> nodes. Each text node has attributes named "top" and "left" and has a child node named <textValue>. This XML file basically represents the coordinate positions of text in a PDF file that has been converted to XML using a PDF2HTML converter.

I want to parse the XML file using conditions such as:
1. Give me all the consecutive nodes in the XML file that have the same "top" attribute. - Here. I am trying to get all nodes that have the same "top" attribute, but may have different "left" attribute value.

Which XML parser supports these kinds of queries? I am familiar with basic DOM parser that just allows me to iterate through the elements and access its attribute value. Is there any XML parser that allows conditional queries to be written on top of it?

Thanks

Dave M
  • 1,302
  • 1
  • 16
  • 28
London guy
  • 27,522
  • 44
  • 121
  • 179

2 Answers2

2

You'll want to investigate XPath, which can do exactly this. Java provides robust, built-in support for this, and can operate on top of a DOM tree. See How to read XML using XPath in Java for one example on how to get started with this.

Community
  • 1
  • 1
ziesemer
  • 27,712
  • 8
  • 86
  • 94
1

You are not looking for a parser, you need a query processor. Any XQuery-compatible processor can do that. Just use a pair of nested loop in your xquery.

J-16 SDiZ
  • 26,473
  • 4
  • 65
  • 84