1

I have sample xml string

<root><a>textA</a><b>textB</b></root>

I want to split/traverse this xml using java split function to have an array where each element and its values are placed one after the other, so that I could manipulate them easily.

I tried

     split("(?=<)|(?<=>)")

but this is returning me root elements as well as closing tags in the array along with one element (additional space string ). Thanks in advance (Option of chosing XML parser is not feasible right now ).

KBR
  • 464
  • 1
  • 7
  • 24
  • 7
    Don't use this sort of thing to deal with XML - use a dedicated XML parser instead (such as the built in DOM parser), it's much less prone to errors. – Michael Berry Jul 04 '14 at 00:12
  • You are right ...but as of now , we dont have option to use that parser due to some constraints (most of the code is already in prod) ..we need to do string manipulation somehow to take this further.. I know this is ugly but cant help it for now.. – KBR Jul 04 '14 at 00:20
  • I'm not really sure I understand your constraints - you can modify the code, but you're not allowed to modify the code to use the standard java XML parsers? Would that incur too much test overhead? I don't get it. – antiduh Jul 04 '14 at 00:43
  • I agree with others: string manipulation will only get you so far, and will likely only work for really simple XML. The only way around not using XML parsers (why on Earth not considering one comes embedded as part of J2SE ?), would be to write your own recursive parser, which is to say: reinvent the Wheel. split() is not an option, as the whole idea of XML is nested elements, which means a linear array is not suited to representing XML data. – NotSoOldNick Jul 04 '14 at 01:01
  • 1
    How do you have String.split() in production but not DocumentBuilderFactory? – David Ehrmann Jul 04 '14 at 03:02
  • possible duplicate of [RegEx match open tags except XHTML self-contained tags](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) – dimo414 Jul 04 '14 at 04:20
  • Guys, let me explain this why was I saying parser is not an immidiate option for me ... current code in production is processing huge xml (changing data/manipulating tags) and already its building documentBuilderfactory and parsing the xml multiple times, which is causing it to run slow . As a work around fix to solve this , I dont want to build the factory again considering that its an overhead for huge xml .So I am taking small chunk out of that big xml and trying to do string manipulation.A long term solution is definately to optimize it to by refactoring and using the parser efficiently. – KBR Jul 04 '14 at 19:37
  • @KBR- here is a paper for your amusement...it is entitled "processing XML with java - a performance benchmark" http://sdiwc.us/digitlib/journal_paper.php?paper=00000582.pdf – vtd-xml-author May 07 '16 at 22:05

1 Answers1

1
String[] split = s.split("<.*?>");

then filter the empty data.

Nile
  • 386
  • 5
  • 15