0

I have an XML file as List<String> means each line of the XML as String in List. Now I am looking for different(& best) ways to parse/read this XML data from List.

Previously I have worked on parsing XML files directly with the help of DocumentBuilder eg document builder.parse(new File(abc.xml). It makes it very easy to get elements by tag names.

Sample XML in List<String> :

<MainTag>
  <InnerTag>
    <userid>abc</userid>            
    <phone>
            <number>123456</number>
    </phone>
    <phone>
              <number>345678</number>
    </phone>
  </InnerTag>
  <InnerTag>
    <userid>xyz</userid>            
    <phone>
             <number>0987876</number>
    </phone>
   </InnerTag>
 </MainTag>

Sample Output:

abc
   123456
   345678
xyz
   0987876

Question : What is best, reading elements by looping through the List<String> or use any Parser (XML parser) to convert List<String> to Document and NodeList? If using Parser is the best option then which one? (I have used Javax.xml.parser).

Notes: Data will be in large size. XML file size could be approx 100MB.. (Speed and less utilization of memory are the keys.)

(There is also an option to create POJOs as the format of XML structure is not going to change much and hold these values as a list of POJOs but I am not preferring it as these values are just one time read purpose.)

user2367130
  • 125
  • 1
  • 2
  • 10
  • Isn't there any way for you to have the entire file as one big string, or even better, just have the path to the file? Jackson might be able to help you with this – user May 26 '20 at 19:19
  • What do you want to do with the data. I see a Document would flood memory if your xml is 100MB. Depending on the actions to perform you may use a SAXParser – jsc57x May 26 '20 at 19:20
  • @JensScharmann The information I will keep reading by the user, create a mapping for particular user info, and keep passing it to other services. One time read-only. So at a time I will store information for a single user only in local memory/cache from the actual input (List). – user2367130 May 26 '20 at 19:25
  • @user Answer for your question is No as of now. If I don't find any efficient way then I can push back because the input is coming from different services. – user2367130 May 26 '20 at 19:27
  • 1
    Then I suggest using a SAXParser and read the entire file with it (no need to read the content in a List of Strings), create your information, perform your action at the endElement() of InnerTag. You have only the current InnerTag information in memory, so it is memory efficent. – jsc57x May 26 '20 at 19:31
  • @JensScharmann Yes. Thank you. I am currently looking at some example os SAXParser. It looks very memory efficient. The only thing here is, for me input is in List and I don't have access to the XML file directly (I am trying to get the path access for the file but no guarantee that I will get it). – user2367130 May 26 '20 at 19:40
  • I would consider it a hack but you may implement your own InputStream which takes the List and pass it to the SAXParser – jsc57x May 27 '20 at 05:51
  • @JensScharmann One last question. Yes, it will be a Hack to pass List as input and another is parsed one with SAX parser? As SAX helps to read file in parts instead of loading bunch in local memory but here I already have full file data in local memory. So if I pass this List to SAXparser, at the end I will have almost double the size of actual data in local is that correct? Just want to know if my understanding is correct here or not. – user2367130 May 27 '20 at 20:38
  • 1
    As long as you keep the List in memory and not read from a file or another streaming input source, this is the biggest chunk you will have. With the SAXParser you will only have the objects in memory you decide to keep. If you send your data at every end tag you don't need it any more and it will be destroyed be the GC the next time. – jsc57x May 28 '20 at 14:30

0 Answers0