1

I have an input document from which I would like to search for an attribute in the fastest manner possible. Eg: below's a pretty silly example:

<PossibleSuspects>
  <PossibleSuspect name="A" id="423" character="shady"/>
  <PossibleSuspect name="B" id="423" character="normal"/>
  <PossibleSuspect name="C" id="423" character="normal"/>
  <PossibleSuspect name="A" id="423" character="shady"/>
</PossibleSuspects>

Basically, I want to get the name attribute of rows where character is shady. I am okay with the first such match (other's can be ignored/will have same name).

I considered looping over the whole document and getting the first match vs doing a regex search after converting document to string.

Which would be faster?

sudshekhar
  • 1,576
  • 3
  • 18
  • 31
  • 1
    I don't recommend to use regex, but if speed matters, you can choose a pull parser (without validation) that doesn't need to build the dom tree. http://developer.android.com/reference/org/xmlpull/v1/XmlPullParser.html – Casimir et Hippolyte Apr 05 '16 at 13:39

1 Answers1

2

If speed really matters, you should go for the second approach, searching the string document with regex (lucky you, XML is no HTML). But you should take into consideration that ordering of attributes in XML is not guaranteed. Also a Scanner may help.

If simplicity of your code is more important, I suggest to use Xpath

XPath xpath = XPathFactory.newInstance().newXPath();
String name = xpath.evaluate("//PossibleSuspect[@character='shady']/@name",document);

And its not that slow either.

Community
  • 1
  • 1
Gerald Mücke
  • 10,724
  • 2
  • 50
  • 67
  • While XML is certainly more easily parsed, [it's not as simple as it appears at first glance](http://stackoverflow.com/a/702222/1831987). – VGR Apr 05 '16 at 14:35
  • 1
    agreed, it's a tradeoff between performance and flexibility/compatibility. I'd prefer the xpath way as I've rarely seen occasions where xml processing is really time-critical. Nevertheless, an event driven approach such as SAX oder XMLStreams is probably a good choice as well. – Gerald Mücke Apr 05 '16 at 15:10