I have the following JAVA class to read from a file containing many lines of tab delimited strings. An example line is like the following:
GO:0085044 GO:0085044 GO:0085044
The code read each line and use split function to put three sub strings into an array, then it put them into a two level hash.
public class LCAReader {
public static void main(String[] args) {
Map<String, Map<String, String>> termPairLCA = new HashMap<String, Map<String, String>>();
File ifile = new File("LCA1.txt");
try {
BufferedReader reader = new BufferedReader(new FileReader(ifile));
String line = null;
while( (line=reader.readLine()) != null ) {
String[] arr = line.split("\t");
if( termPairLCA.containsKey(arr[0]) ) {
if( termPairLCA.get(arr[0]).containsKey(arr[1]) ) {
System.out.println("Error: Duplicate term in LCACache");
} else {
termPairLCA.get(arr[0]).put(new String(arr[1]), new String(arr[2]));
}
} else {
Map<String, String> tempMap = new HashMap<String, String>();
tempMap.put( new String(arr[1]), new String(arr[2]) );
termPairLCA.put( new String(arr[0]), tempMap );
}
}
reader.close();
} catch (IOException e) {
System.out.println(e.getMessage());
}
}
}
When I ran the program, I got the following run time error after some time of running. I noticed the memory usage kept increasing.
Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded at java.util.regex.Pattern.compile(Pattern.java:1469) at java.util.regex.Pattern.(Pattern.java:1150) at java.util.regex.Pattern.compile(Pattern.java:840) at java.lang.String.split(String.java:2304) at java.lang.String.split(String.java:2346) at LCAReader.main(LCAReader.java:17)
The input file is almost 2G and the machine I ran the program has 8G memory. I also tried -Xmx4096m parameter to run the program but that did not help. So I guess there is some memory leak in my code, but I cannot find them.
Can anyone help me on this? Thanks in advance!