0

I have a large file (something like 3GB) and read into a ArrayList When i run the code below, after several minutes code run very slowly and CPU usage is high. after several minutes eclipse console show Error java.lang.OutOfMemoryError: GC overhead limit exceeded.

  • OS:windows2008R2,
  • 4 cup,
  • 32GB memory
  • java version "1.7.0_60"

eclipse.ini

-startup
plugins/org.eclipse.equinox.launcher_1.3.0.v20130327-1440.jar
--launcher.library
plugins/org.eclipse.equinox.launcher.win32.win32.x86_64_1.1.200.v20140116-2212
-product
org.eclipse.epp.package.standard.product
--launcher.defaultAction
openFile
#--launcher.XXMaxPermSize
#256M
-showsplash
org.eclipse.platform
#--launcher.XXMaxPermSize
#256m
--launcher.defaultAction
openFile
--launcher.appendVmargs
-vmargs
-Dosgi.requiredJavaVersion=1.6
-Xms10G
-Xmx10G
-XX:+UseParallelGC
-XX:ParallelGCThreads=24
-XX:MaxGCPauseMillis=1000
-XX:+UseAdaptiveSizePolicy

java code:

BufferedInputStream bis = new BufferedInputStream(new FileInputStream(new File("/words/wordlist.dat")));        
            InputStreamReader isr = new InputStreamReader(bis,"utf-8");
            BufferedReader in = new BufferedReader(isr,1024*1024*512);

            String strTemp = null;
            long ind = 0;

            while (((strTemp = in.readLine()) != null)) 
            {
                matcher.reset(strTemp);

                if(strTemp.contains("$"))
                {
                    al.add(strTemp);
                    strTemp = null;
                }
                ind = ind + 1;
                if(ind%100000==0)
                {
                    System.out.println(ind+"    100,000 +");
                }

            }
            in.close();

my use case :

neural network
java
oracle
solaris
quick sort
apple
green fluorescent protein
acm
trs
pangjiale
  • 145
  • 1
  • 1
  • 10
  • 2
    Can you please elaborate your usecase? Why do you need 3gb file in memory? – Mahendra Feb 27 '16 at 10:52
  • 2
    Is this necessary to load whole file into memory? – Devavrata Feb 27 '16 at 10:52
  • You can temporarily prevent this problem by setting `-XX:-UseGCOverheadLimi` in eclipse configuration : [disable-the-usegcoverheadlimit-in-centos](http://stackoverflow.com/questions/18934146/disable-the-usegcoverheadlimit-in-centos) – Mahendra Feb 27 '16 at 11:01
  • why not increase the JVM heap size (not eclipse!) to somewhat like 6gb? You clearly have enough ram ;) – Thomas Jungblut Feb 27 '16 at 11:02
  • @haraldK but that's for eclipse, not the JVM it launches – Thomas Jungblut Feb 27 '16 at 11:03
  • i'm writing a program in java to get statistics on how many times the keyword were found in the search word log list – pangjiale Feb 27 '16 at 11:08
  • maybe Thomas Jungblut is right, i will have a try, the same code on solaris, useing command line, works. – pangjiale Feb 27 '16 at 11:24
  • Could you say what you wish to achieve exactly? – Krzysztof Cichocki Feb 27 '16 at 11:29
  • Why do you need to read the entire file into memory? Nearly all computing problems including compilation can be solved with a single pass over the file reading it line by line or record by record, or char by char in the case of compilation. – user207421 Feb 27 '16 at 11:59

1 Answers1

1

writing a program in java to get statistics on how many times the keyword were found in the search word log list

I suggest you just do that. Create a map which counts the number of occurrences of keywords, or that matter all words.

Using the Java 8 streams you can do this in one or two lines without having to load the entire file into memory at once.

try (Stream<String> s = Files.lines(Paths.get("filename"))) {
    Map<String, Long> count = s.flatMap(line -> Stream.of(line.trim().split(" +")))
            .collect(Collectors.groupingBy(w -> w, Collectors.counting()));
}
Peter Lawrey
  • 525,659
  • 79
  • 751
  • 1,130