Parsing xml content line by line and extracting some values from it

Question

How can I elegantly extract these values from the following text content ? I have this long file that contains thousands of entries. I tried the XML Parser and Slurper approach, but I ran out of memory. I have only 1GB. So now I'm reading the file line by line and extract the values. But I think there should be a better in Java/Groovy to do this, maybe a cleaner and reusable way. (I read the content from Standard-In)

1 line of Content:

<sample t="336" lt="0" ts="1406036100481" s="true" lb="txt1016.pb" rc="" rm="" tn="Thread Group 1-9" dt="" by="0"/>

My Groovy Solution:

Map<String, List<Integer>> requestSet = new HashMap<String, List<Integer>>();
String reqName;
String[] tmpData;
Integer reqTime;

System.in.eachLine() { line ->

    if (line.find("sample")){
        tmpData = line.split(" ");
        reqTime = Integer.parseInt(tmpData[1].replaceAll('"', '').replaceAll("t=", ""));
        reqName = tmpData[5].replaceAll('"', '').replaceAll("lb=", "");

        if (requestSet.containsKey(reqName)){
            List<Integer> myList = requestSet.get(reqName);
            myList.add(reqTime);
            requestSet.put(reqName, myList);
        }else{
            List<Integer> myList = new ArrayList<Integer>();
            myList.add(reqTime);
            requestSet.put(reqName, myList);
        }
    }
}

Any suggestion or code snippets that improve this ?

Use XML Streaming API. You get "events" (callbacks) for each tag and store in memory only what you need. Never use regex to process XML, they are fundamentally incompatible. See http://stackoverflow.com/a/1732454/18157. For a tutorial see http://docs.oracle.com/javase/tutorial/jaxp/stax/using.html — Jim Garrison, Aug 08 '14 at 20:03
Why not use a xml parser? JDOM and XOM are good choices it is much easyer to work with them and you don't have to worry about such things. — Lars, Aug 08 '14 at 20:05
Sorry I was wrong with JDOM and XOM. I found another post that recommends using StAX parser maybe that will help you: http://stackoverflow.com/a/3969920/3579095 — Lars, Aug 08 '14 at 20:13
I decided not to go the xml route since I don't care about xml validation, and also don't want to keep any of the DOM elements in memory. I want to maybe grep through things but in a more elegant way. — AlexCon, Aug 08 '14 at 20:29

Parsing xml content line by line and extracting some values from it

0 Answers0