0

Let's say i have two xml files. Both contain a specific element (let's say "name") but one has the element on a different position than the other xml file.

ex:

first xml-file:

<root>
 <element1>text1</element1>
 <element2>
  <name value="firstname">John</name>
 <element2>
</root>

second xml-file:

<root>
 <element1>text1</element1>
 <name value="firstname">Michael</name>
 <element2>text2</element2>
</root>

what is the most runtime-efficient way to get this elements without knowing their position before?

(Sorry if there is already an answer on stackoverflow but I didn't find one)

Willem Van Onsem
  • 443,496
  • 30
  • 428
  • 555
ortnm12
  • 3
  • 1
  • 1
    Try to get it at first position and if it fails, try at second one? –  Mar 02 '15 at 12:35
  • @ortnm12: is runtime efficiency that important? For such small files, the differences will be extremely small... – Willem Van Onsem Mar 02 '15 at 12:36
  • @CommuSoft Runtime efficiency would not be a problem if i had to do this with a few xml files but my application converts about 10Gb of xml-files to pdf files and stores some parts of the xml files and the pdfs online before surrounding the converted xml files with new root-elements which takes way more time – ortnm12 Mar 02 '15 at 12:50

2 Answers2

2

You might want to investigate Xpath. see How to read XML using XPath in Java for your specific case the xpath will be "//name" double / means anywhere is the current document from root.

Community
  • 1
  • 1
BMac
  • 2,183
  • 3
  • 22
  • 30
1

A not necessarily most efficient, but more convenient way to do this is making use of XPath queries:

File f = new File("path/to/file.xml");
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse(f);
XPathFactory xPathfactory = XPathFactory.newInstance();
XPath xpath = xPathfactory.newXPath();
XPathExpression expr = xpath.compile("//name");
NodeList nl = (NodeList) expr.evaluate(doc, XPathConstants.NODESET);

The query:

"//name"

means:

Search for all <name> tags, regardless of their depth. You can then process the NodeList.

Although there is some overhead involved with XPath queries, the current technologies are in many cases sufficient and furthermore it is easy to modify the queries (what if for some reason you must slightly modify the query?).

Willem Van Onsem
  • 443,496
  • 30
  • 428
  • 555