I have several XML files ( in size of GBs ) that are to be converted to JSON. I am easily able to convert small sized files ( in KiloBytes ) using the JSON library ( org.json - https://mvnrepository.com/artifact/org.json/json/20180813 ).
Here's the code that i am using
static String line="",str="";
BufferedReader br = new BufferedReader(new FileReader(link));
FileWriter fw = new FileWriter(outputlink);
JSONObject jsondata = null;
while ((line = br.readLine()) != null)
{
str+=line;
}
jsondata = XML.toJSONObject(str);
But the large files ( even the <100 MB ones ) are taking too long to process and the larger ones are throwing java.lang.OutOfMemoryError: Java heap space. So, how to optimize the code to process large files ( or any other approach/library ).
UPDATE
I have updated the code and I am writing XML into JSON segment by segment
My XML :
<PubmedArticleSet>
<PubmedArticle>
</PubmedArticle>
<PubmedArticle>
</PubmedArticle>
...
</PubmedArticleSet>
So I am ignoring the root node <PubmedArticleSet>
( I will add it later ) converting each <PubmedArticle> </PubmedArticle>
to JSON and writing at a time
br = new BufferedReader(new FileReader(link));
fw = new FileWriter(outputlink,true);
StringBuilder str = new StringBuilder();
br.readLine(); // to skip the first three lines and the root
br.readLine();
br.readLine();
while ((line = br.readLine()) != null) {
JSONObject jsondata = null;
str.append(line);
System.out.println(str);
if (line.trim().equals("</PubmedArticle>")) { // split here
jsondata = XML.toJSONObject(str.toString());
String jsonPrettyPrintString = jsondata.toString(PRETTY_PRINT_INDENT_FACTOR);
fw.append(jsonPrettyPrintString.toString());
System.out.println("One done"); // One section done
str= new StringBuilder();
}
}
fw.close();
I am no longer getting the HeapError but still the processing is taking hours for ~300 MB range files. Kindly provide any suggestions to speed up this process.