0

I basically trying to load 6 large file (500MB per file on average) using buffered reader. In one run, I loaded the files one by one; In the other run, I loaded the files in parallel. There appeared to be a significant difference in term of time consumed by the two runs:

  • Sequential Run: 14.634s
  • Parallel Run: 120.317s
  • NOTE that this difference shows up clearly on the 1st time reading these files into the JVM. In subsequent reads, duration shortens significantly, which I assume is due to caching. Therefore, I recreate these files (e.g. delete them under the current dir and copy them from somewhere to the current dir again) before I ran two independent Eclipse instances (I also tried ran them as two runnable jars from cmd line, same result).
  • Sub-question: Is there a way to eliminate this caching effect to stop it affecting my experiment before I do the run? e.g. In windows, I deleted all the Java temp files through Control Panel -> Java -> Settings -> Delete Files ... as well as doing something like this to clear windows cache

My Main Question is: What caused such vast difference? I have already tried the following:

  • I notice that as in the GC details printed out, the Sequential run of the program has a high usage of Eden space but low usage of OldGen (and had more Full GCs), while the opposite is true to the Parallel run.
  • Therefore I tried to adjust the -XX:NewRatio to a lower number but it didn't help.
  • Then I increased both the min and max heap size via -Xms and -Xms to 2g, which didn't help either.
  • If this is due to contention for I/O, is there some tool that I can use to confirm this? I currently use VirtualVM which I can do Sampling and Profiling of the CPU time for classes. But I wonder if there is some other tool to check this in a clearer way.

Following are the simplified code



    public class BufferedReaderConcurentPerfTest {
        // Parallel or Serial
        public static boolean isParallel = true;
        public static boolean isToPrintInfo = true;

        public static String ORIGINAL_LOG_DIR = ".\\largeFiles";

        public static void main(String args[]) throws Exception {
            long startTime = System.currentTimeMillis();

            processFiles();

            double totalProcessDuration = (System.currentTimeMillis() - startTime) / ((double) 1000);
            printInfo("TOTAL PROCESS TIME:" + totalProcessDuration + "s");
        }

        public static void processFiles() 
                throws Exception {
            File[] files = new File(ORIGINAL_LOG_DIR).listFiles();
            if (isParallel){
                ExecutorService fixThreadPoolES = Executors.newFixedThreadPool(files.length);
                List> handleLogfutures = new ArrayList>();

                for(File file : files){
                    if(file.isFile()){

                        // Print Start of Processing the file 
                        printInfo("--START REPLAYING {File: " + file.getName() + "}");

                        // Submit a Callable task to Replay the file
                        HandleLogFileTaskCallable handleLogFileTaskCallable = new HandleLogFileTaskCallable(file);                      
                        handleLogfutures.add(fixThreadPoolES.submit(handleLogFileTaskCallable));
                    }
                }
                gatherFileReplayResult(handleLogfutures);

                // wait for termination        
                fixThreadPoolES.shutdown();
                printInfo("Right After shutdown()");
                fixThreadPoolES.awaitTermination(Long.MAX_VALUE, TimeUnit.NANOSECONDS);
                printInfo("Right After awaitTermination()");

            } else {
                for(File file : files){
                    if(file.isFile()){  
                        processLargeFile(file);
                    }
                }           
            }
        }

        public static synchronized void gatherFileReplayResult(List> handleLogfutures) throws InterruptedException, ExecutionException{
            for(Future result: handleLogfutures){
                printInfo("Gathered Result: " + result.get());
            }
        }

        private static class HandleLogFileTaskCallable implements Callable {
            File _file = null;
            public HandleLogFileTaskCallable (final File file){
                _file = file;
            }
            @Override
            public String call() throws Exception {
                // Handle the Replay of a log file
                String handleLogResult = "--Process {" + _file.getName() + "} took " + String.valueOf(replayFile()) + "s";
                return handleLogResult;
            }
            public double replayFile() throws Exception {
                long startTime = System.currentTimeMillis();

                processLargeFile(_file);            

                double processDuration = (System.currentTimeMillis() - startTime) / ((double) 1000);
                printInfo("----processLargeFile(): {" + _file.getName() + "}: " 
                        + processDuration + "s");

                return processDuration;
            }       
        }

        public static void processLargeFile(File largeFile) throws IOException {
            long currStart = System.currentTimeMillis();
            long currTime;
            FileReader OriginalLogFileReader = null; 
            try {
                OriginalLogFileReader = new FileReader(largeFile.getAbsolutePath());
            }
            catch (FileNotFoundException fne) {
                throw new RuntimeException(fne);
            }       
            BufferedReader originalLogBuffReader = new BufferedReader(OriginalLogFileReader, (int) (largeFile.length() / 10) );     

            // - Record Perf Data
            currTime = System.currentTimeMillis();
            long initBufferReaderDuration = currTime - currStart;
            currStart = currTime;
            printInfo("----Time Init BufferReader for: {" + largeFile.getName() + "}: " 
                    + initBufferReaderDuration / (double) 1000 + "s");

            // Start reading the original log file
            String logLine = originalLogBuffReader.readLine();      

            // - Record Perf Data
            currTime = System.currentTimeMillis();
            long readFirstLineDuration = currTime - currStart;
            currStart = currTime;
            printInfo("----Time Reading 1st line of: {" + largeFile.getName() + "}: " 
                    + readFirstLineDuration / (double) 1000 + "s");

            int numLines = 0;
            long bufferReadLineDurationTotal = 0;
            while (logLine != null){
                currStart = System.currentTimeMillis();

                // Read the next line   
                logLine = originalLogBuffReader.readLine();
                numLines++;

                // - Accumulate Perf Data
                currTime = System.currentTimeMillis();
                bufferReadLineDurationTotal += currTime - currStart; 
                currStart = currTime;
            }

            // - Record Perf Data
            printInfo("----Total Time Reading: {" + largeFile.getName() + "}: " 
                    + bufferReadLineDurationTotal / (double) 1000 + "s"
                    + "; Read Lines: " + numLines);

            // Completed reading the original log file
            originalLogBuffReader.close();      

            // - Record Perf Data
            currTime = System.currentTimeMillis();
            long closeBuffReaderDuration = currTime - currStart;
            currStart = currTime;
            printInfo("----Time to close BufferedReader for: {" + largeFile.getName() + "}: " 
                    + closeBuffReaderDuration / (double) 1000 + "s");
        }

        public static void printInfo(String infoStr){
            if (isToPrintInfo){
                String outStr = "######## Program Output: {" + Thread.currentThread().getName() + "} " + infoStr;       
                System.out.println(outStr);
            }
        }
    }


The GC and Program Outputs for Sequential Run:


    ######## Program Output: {main} ----Time Init BufferReader for: {File-2014-09-02.log}: 0.02s
    ######## Program Output: {main} ----Time Reading 1st line of: {File-2014-09-02.log}: 1.388s
    1.791: [GC [PSYoungGen: 32256K->400K(37632K)] 96663K->64807K(123712K), 0.0083754 secs] [Times: user=0.00 sys=0.00, real=0.01 secs] 
    1.814: [GC [PSYoungGen: 32656K->224K(37632K)] 97063K->64631K(123712K), 0.0005777 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 
    2.211: [GC [PSYoungGen: 32480K->192K(37632K)] 96887K->64599K(123712K), 0.0013634 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 
    2.226: [GC [PSYoungGen: 32448K->208K(69888K)] 96855K->64615K(155968K), 0.0004087 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 
    2.541: [GC [PSYoungGen: 64720K->176K(69888K)] 129127K->64583K(155968K), 0.0006677 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 
    2.924: [GC [PSYoungGen: 64688K->192K(129472K)] 129095K->64599K(215552K), 0.0109592 secs] [Times: user=0.03 sys=0.00, real=0.01 secs] 
    3.748: [GC [PSYoungGen: 129216K->64K(129472K)] 193623K->64631K(215552K), 0.0007187 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 
    4.513: [GC [PSYoungGen: 129088K->96K(124032K)] 193655K->64663K(210112K), 0.0005527 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 
    5.234: [GC [PSYoungGen: 124000K->32K(119104K)] 188567K->64599K(205184K), 0.0005023 secs] [Times: user=0.06 sys=0.00, real=0.00 secs] 
    ######## Program Output: {main} ----Total Time Reading: {File-2014-09-02.log}: 3.517s; Read Lines: 825157
    ######## Program Output: {main} ----Time to close BufferedReader for: {File-2014-09-02.log}: 0.0010s
    ######## Program Output: {main} ----Time Init BufferReader for: {File-2014-09-03.log}: 0.0030s
    ######## Program Output: {main} ----Time Reading 1st line of: {File-2014-09-03.log}: 0.317s
    5.611: [GC [PSYoungGen: 119072K->160K(190656K)] 183639K->114291K(326336K), 0.0254198 secs] [Times: user=0.06 sys=0.00, real=0.03 secs] 
    5.637: [Full GC [PSYoungGen: 160K->0K(190656K)] [PSOldGen: 114131K->49705K(117440K)] 114291K->49705K(308096K) [PSPermGen: 2724K->2724K(21248K)], 0.0454966 secs] [Times: user=0.00 sys=0.00, real=0.05 secs] 
    6.783: [GC [PSYoungGen: 190464K->96K(257856K)] 240169K->49801K(375296K), 0.0005805 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 
    7.935: [GC [PSYoungGen: 257504K->64K(266688K)] 307209K->49777K(384128K), 0.0005826 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 
    ######## Program Output: {main} ----Total Time Reading: {File-2014-09-03.log}: 2.587s; Read Lines: 665049
    ######## Program Output: {main} ----Time to close BufferedReader for: {File-2014-09-03.log}: 0.0s
    ######## Program Output: {main} ----Time Init BufferReader for: {File-2014-09-04.log}: 0.0030s
    ######## Program Output: {main} ----Time Reading 1st line of: {File-2014-09-04.log}: 0.315s
    9.139: [GC [PSYoungGen: 266304K->128K(266624K)] 316017K->100888K(384064K), 0.0078761 secs] [Times: user=0.00 sys=0.00, real=0.01 secs] 
    9.147: [Full GC [PSYoungGen: 128K->0K(266624K)] [PSOldGen: 100760K->51188K(148160K)] 100888K->51188K(414784K) [PSPermGen: 2725K->2725K(21248K)], 0.0104032 secs] [Times: user=0.02 sys=0.00, real=0.01 secs] 
    10.564: [GC [PSYoungGen: 266240K->64K(357184K)] 317428K->51252K(505344K), 0.0005645 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 
    ######## Program Output: {main} ----Total Time Reading: {File-2014-09-04.log}: 2.519s; Read Lines: 690290
    ######## Program Output: {main} ----Time to close BufferedReader for: {File-2014-09-04.log}: 0.0s
    ######## Program Output: {main} ----Time Init BufferReader for: {File-2014-09-08.log}: 0.0030s
    ######## Program Output: {main} ----Time Reading 1st line of: {File-2014-09-08.log}: 0.305s
    12.190: [GC [PSYoungGen: 356864K->160K(357184K)] 408052K->101096K(505344K), 0.0080207 secs] [Times: user=0.06 sys=0.00, real=0.01 secs] 
    12.198: [Full GC [PSYoungGen: 160K->0K(357184K)] [PSOldGen: 100936K->49874K(172352K)] 101096K->49874K(529536K) [PSPermGen: 2728K->2728K(21248K)], 0.0108381 secs] [Times: user=0.02 sys=0.00, real=0.01 secs] 
    ######## Program Output: {main} ----Total Time Reading: {File-2014-09-08.log}: 1.581s; Read Lines: 665277
    ######## Program Output: {main} ----Time to close BufferedReader for: {File-2014-09-08.log}: 0.0s
    12.979: [GC [PSYoungGen: 346295K->128K(470656K)] 396170K->50002K(643008K), 0.0003402 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 
    ######## Program Output: {main} ----Time Init BufferReader for: {File-2014-09-10.log}: 0.0040s
    ######## Program Output: {main} ----Time Reading 1st line of: {File-2014-09-10.log}: 0.069s
    13.794: [GC [PSYoungGen: 470656K->96K(478144K)] 520530K->96079K(650496K), 0.0074520 secs] [Times: user=0.06 sys=0.00, real=0.01 secs] 
    ######## Program Output: {main} ----Total Time Reading: {File-2014-09-10.log}: 0.829s; Read Lines: 628732
    ######## Program Output: {main} ----Time to close BufferedReader for: {File-2014-09-10.log}: 0.0010s
    ######## Program Output: {main} ----Time Init BufferReader for: {File-2014-09-11.log}: 0.0030s
    ######## Program Output: {main} ----Time Reading 1st line of: {File-2014-09-11.log}: 0.079s
    14.597: [GC [PSYoungGen: 477856K->96K(577216K)] 573839K->149576K(749568K), 0.0190215 secs] [Times: user=0.03 sys=0.03, real=0.02 secs] 
    14.616: [Full GC [PSYoungGen: 96K->0K(577216K)] [PSOldGen: 149480K->53640K(208064K)] 149576K->53640K(785280K) [PSPermGen: 2728K->2728K(21248K)], 0.0114346 secs] [Times: user=0.02 sys=0.00, real=0.01 secs] 
    ######## Program Output: {main} ----Total Time Reading: {File-2014-09-11.log}: 0.955s; Read Lines: 900463
    ######## Program Output: {main} ----Time to close BufferedReader for: {File-2014-09-11.log}: 0.0s
    ######## Program Output: {main} TOTAL PROCESS TIME:14.634s
    Heap
     PSYoungGen      total 577216K, used 234148K [0x00000007d5f60000, 0x00000007fb250000, 0x0000000800000000)
      eden space 576896K, 40% used [0x00000007d5f60000,0x00000007e4409098,0x00000007f92c0000)
      from space 320K, 0% used [0x00000007f9310000,0x00000007f9310000,0x00000007f9360000)
      to   space 320K, 0% used [0x00000007f92c0000,0x00000007f92c0000,0x00000007f9310000)
     PSOldGen        total 208064K, used 53640K [0x0000000781e00000, 0x000000078e930000, 0x00000007d5f60000)
      object space 208064K, 25% used [0x0000000781e00000,0x00000007852622d8,0x000000078e930000)
     PSPermGen       total 21248K, used 2736K [0x000000077cc00000, 0x000000077e0c0000, 0x0000000781e00000)
      object space 21248K, 12% used [0x000000077cc00000,0x000000077ceac088,0x000000077e0c0000)


GC and Program Outputs for the Parallel Run:


    ######## Program Output: {main} --START REPLAYING {File: File-2014-09-02.log}
    ######## Program Output: {main} --START REPLAYING {File: File-2014-09-03.log}
    ######## Program Output: {main} --START REPLAYING {File: File-2014-09-04.log}
    ######## Program Output: {main} --START REPLAYING {File: File-2014-09-08.log}
    ######## Program Output: {main} --START REPLAYING {File: File-2014-09-10.log}
    ######## Program Output: {main} --START REPLAYING {File: File-2014-09-11.log}
    ######## Program Output: {pool-1-thread-4} ----Time Init BufferReader for: {File-2014-09-08.log}: 0.041s
    ######## Program Output: {pool-1-thread-2} ----Time Init BufferReader for: {File-2014-09-03.log}: 0.055s
    ######## Program Output: {pool-1-thread-3} ----Time Init BufferReader for: {File-2014-09-04.log}: 0.054s
    ######## Program Output: {pool-1-thread-5} ----Time Init BufferReader for: {File-2014-09-10.log}: 0.043s
    ######## Program Output: {pool-1-thread-6} ----Time Init BufferReader for: {File-2014-09-11.log}: 0.045s
    ######## Program Output: {pool-1-thread-1} ----Time Init BufferReader for: {File-2014-09-02.log}: 0.068s
    ######## Program Output: {pool-1-thread-3} ----Time Reading 1st line of: {File-2014-09-04.log}: 7.173s
    7.478: [GC [PSYoungGen: 32256K->336K(37632K)] 346588K->314668K(373824K), 0.0011522 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 
    ######## Program Output: {pool-1-thread-6} ----Time Reading 1st line of: {File-2014-09-11.log}: 9.678s
    9.947: [GC [PSYoungGen: 32592K->288K(37632K)] 346924K->314620K(373824K), 0.0014428 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 
    9.980: [GC [PSYoungGen: 32544K->224K(37632K)] 346876K->314556K(373824K), 0.0020556 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 
    ######## Program Output: {pool-1-thread-5} ----Time Reading 1st line of: {File-2014-09-10.log}: 11.273s
    11.538: [GC [PSYoungGen: 32480K->240K(69888K)] 346812K->314572K(406080K), 0.0014329 secs] [Times: user=0.05 sys=0.00, real=0.00 secs] 
    ######## Program Output: {pool-1-thread-1} ----Time Reading 1st line of: {File-2014-09-02.log}: 11.821s
    12.119: [GC [PSYoungGen: 64752K->240K(69888K)] 379084K->314572K(406080K), 0.0016242 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 
    18.125: [GC [PSYoungGen: 64752K->224K(62656K)] 379084K->314556K(398848K), 0.0017085 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 
    18.235: [GC [PSYoungGen: 62624K->64K(60416K)] 376956K->314612K(396608K), 0.0015684 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 
    20.149: [GC [PSYoungGen: 60416K->32K(58496K)] 374964K->314580K(394688K), 0.0014622 secs] [Times: user=0.01 sys=0.02, real=0.00 secs] 
    ######## Program Output: {pool-1-thread-4} ----Time Reading 1st line of: {File-2014-09-08.log}: 25.276s
    25.555: [GC [PSYoungGen: 58464K->32K(57024K)] 373012K->314588K(393216K), 0.0013281 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 
    26.241: [GC [PSYoungGen: 56608K->32K(54848K)] 371164K->314608K(391040K), 0.0017878 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 
    ######## Program Output: {pool-1-thread-2} ----Time Reading 1st line of: {File-2014-09-03.log}: 27.289s
    27.577: [GC [PSYoungGen: 54816K->64K(53568K)] 369392K->314656K(389760K), 0.0015123 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 
    27.817: [GC [PSYoungGen: 53184K->32K(51584K)] 367776K->314632K(387776K), 0.0008322 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 
    28.787: [GC [PSYoungGen: 51552K->64K(50432K)] 366152K->314694K(386624K), 0.0013182 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 
    31.020: [GC [PSYoungGen: 50048K->32K(48576K)] 364678K->314678K(384768K), 0.0010936 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 
    31.041: [GC [PSYoungGen: 48544K->32K(78080K)] 363190K->314694K(414272K), 0.0034260 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 
    32.105: [GC [PSYoungGen: 77664K->64K(124288K)] 392326K->314734K(460480K), 0.0010284 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 
    32.841: [GC [PSYoungGen: 124288K->128K(127296K)] 438958K->314834K(463488K), 0.0010359 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 
    40.904: [GC [PSYoungGen: 127040K->32K(174336K)] 441746K->314746K(510528K), 0.0020523 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 
    50.823: [GC [PSYoungGen: 173984K->32K(174336K)] 488698K->314762K(510528K), 0.0015654 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 
    56.807: [GC [PSYoungGen: 173984K->64K(166720K)] 488714K->314826K(502912K), 0.0016695 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 
    60.568: [GC [PSYoungGen: 166720K->32K(160000K)] 481482K->314802K(496192K), 0.0014042 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 
    66.389: [GC [PSYoungGen: 159712K->32K(153152K)] 474482K->314810K(489344K), 0.0013505 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 
    69.967: [GC [PSYoungGen: 153120K->96K(147136K)] 467898K->314890K(483328K), 0.0010909 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 
    74.727: [GC [PSYoungGen: 146912K->64K(140864K)] 461706K->314874K(477056K), 0.0016454 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 
    78.912: [GC [PSYoungGen: 140864K->96K(135360K)] 455674K->314930K(471552K), 0.0013360 secs] [Times: user=0.05 sys=0.01, real=0.00 secs] 
    ######## Program Output: {pool-1-thread-6} ----Total Time Reading: {File-2014-09-11.log}: 69.007s; Read Lines: 900463
    ######## Program Output: {pool-1-thread-6} ----Time to close BufferedReader for: {File-2014-09-11.log}: 0.0010s
    ######## Program Output: {pool-1-thread-6} ----processLargeFile(): {File-2014-09-11.log}: 78.745s
    83.932: [GC [PSYoungGen: 135200K->64K(129728K)] 450034K->314906K(465920K), 0.0018301 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 
    84.674: [GC [PSYoungGen: 129728K->32K(124736K)] 444570K->314898K(460928K), 0.0013459 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 
    92.935: [GC [PSYoungGen: 124512K->64K(119616K)] 439378K->314954K(455808K), 0.0009738 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 
    97.849: [GC [PSYoungGen: 119616K->32K(115136K)] 434506K->314930K(451328K), 0.0013103 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 
    106.123: [GC [PSYoungGen: 114912K->64K(110464K)] 429810K->314970K(446656K), 0.0007003 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 
    112.326: [GC [PSYoungGen: 110464K->32K(106432K)] 425370K->314954K(442624K), 0.0015029 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 
    112.444: [GC [PSYoungGen: 106208K->32K(102208K)] 421130K->314962K(438400K), 0.0009638 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 
    ######## Program Output: {pool-1-thread-4} ----Total Time Reading: {File-2014-09-08.log}: 88.921s; Read Lines: 665277
    ######## Program Output: {pool-1-thread-4} ----Time to close BufferedReader for: {File-2014-09-08.log}: 0.0010s
    ######## Program Output: {pool-1-thread-4} ----processLargeFile(): {File-2014-09-08.log}: 114.258s
    116.242: [GC [PSYoungGen: 102176K->96K(98496K)] 417106K->315042K(434688K), 0.0018944 secs] [Times: user=0.03 sys=0.00, real=0.00 secs] 
    ######## Program Output: {pool-1-thread-5} ----Total Time Reading: {File-2014-09-10.log}: 104.703s; Read Lines: 628732
    ######## Program Output: {pool-1-thread-5} ----Time to close BufferedReader for: {File-2014-09-10.log}: 0.0010s
    ######## Program Output: {pool-1-thread-5} ----processLargeFile(): {File-2014-09-10.log}: 116.039s
    ######## Program Output: {pool-1-thread-2} ----Total Time Reading: {File-2014-09-03.log}: 91.134s; Read Lines: 665049
    ######## Program Output: {pool-1-thread-2} ----Time to close BufferedReader for: {File-2014-09-03.log}: 0.0010s
    ######## Program Output: {pool-1-thread-2} ----processLargeFile(): {File-2014-09-03.log}: 118.497s
    119.999: [GC [PSYoungGen: 98400K->32K(94720K)] 413346K->314986K(430912K), 0.0008738 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 
    ######## Program Output: {pool-1-thread-3} ----Total Time Reading: {File-2014-09-04.log}: 112.568s; Read Lines: 690290
    ######## Program Output: {pool-1-thread-3} ----Time to close BufferedReader for: {File-2014-09-04.log}: 0.0s
    ######## Program Output: {pool-1-thread-3} ----processLargeFile(): {File-2014-09-04.log}: 119.813s
    120.410: [GC [PSYoungGen: 94688K->96K(91328K)] 409642K->315066K(427520K), 0.0007063 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 
    ######## Program Output: {pool-1-thread-1} ----Total Time Reading: {File-2014-09-02.log}: 108.343s; Read Lines: 825157
    ######## Program Output: {pool-1-thread-1} ----Time to close BufferedReader for: {File-2014-09-02.log}: 0.0010s
    ######## Program Output: {pool-1-thread-1} ----processLargeFile(): {File-2014-09-02.log}: 120.243s
    ######## Program Output: {main} Gathered Result: --Process {File-2014-09-02.log} took 120.243s
    ######## Program Output: {main} Gathered Result: --Process {File-2014-09-03.log} took 118.497s
    ######## Program Output: {main} Gathered Result: --Process {File-2014-09-04.log} took 119.813s
    ######## Program Output: {main} Gathered Result: --Process {File-2014-09-08.log} took 114.258s
    ######## Program Output: {main} Gathered Result: --Process {File-2014-09-10.log} took 116.039s
    ######## Program Output: {main} Gathered Result: --Process {File-2014-09-11.log} took 78.745s
    ######## Program Output: {main} Right After shutdown()
    ######## Program Output: {main} Right After awaitTermination()
    ######## Program Output: {main} TOTAL PROCESS TIME:120.317s
    Heap
     PSYoungGen      total 91328K, used 58978K [0x00000007d5f60000, 0x00000007e4c60000, 0x0000000800000000)
      eden space 91136K, 64% used [0x00000007d5f60000,0x00000007d98e0a18,0x00000007db860000)
      from space 192K, 50% used [0x00000007e4c00000,0x00000007e4c18000,0x00000007e4c30000)
      to   space 192K, 0% used [0x00000007e4c30000,0x00000007e4c30000,0x00000007e4c60000)
     PSOldGen        total 336192K, used 314970K [0x0000000781e00000, 0x0000000796650000, 0x00000007d5f60000)
      object space 336192K, 93% used [0x0000000781e00000,0x0000000795196808,0x0000000796650000)
     PSPermGen       total 21248K, used 2934K [0x000000077cc00000, 0x000000077e0c0000, 0x0000000781e00000)
      object space 21248K, 13% used [0x000000077cc00000,0x000000077cedd928,0x000000077e0c0000)


Some default parameters setting for the JVM printed out


    uintx InitialHeapSize                          := 132154176       {product}  
    uintx MaxGCPauseMillis                          = 4294967295      {product}           
    uintx MaxHeapFreeRatio                          = 70              {product}           
    uintx MaxHeapSize                              := 2116026368      {product}    
     bool UseConcMarkSweepGC                        = false           {product}       
     bool UseParNewGC                               = false           {product}     
     bool UseParallelGC                            := true            {product}           
     bool UseParallelOldGC                          = false           {product}           
     bool UseParallelOldGCCompacting                = true            {product}   
     bool UseParNewGC                               = false           {product} 



Computer Details:

64-bit, i7-370 @3.4ghz, 8-core, 8g memo, NTFS with 137gb Free Space

Stochastika
  • 305
  • 1
  • 2
  • 14
  • @ScaryWombat I see ur point there. However I did think about this too and researched before I proceeded with the static methods. Please see [this post](http://stackoverflow.com/a/6202996/2041023) and [this post](http://stackoverflow.com/a/10647734/2041023). – Stochastika Oct 07 '14 at 04:16
  • So how is the down vote justified!!!??? – Stochastika Oct 07 '14 at 04:17
  • 1
    Using two different program instances hardly helps as it's the OS what does the caching. You'd need to get two computers or reboot. If you think that it's GC related, then you should probably go for a measurement after a warmup. – maaartinus Oct 07 '14 at 05:25
  • @maaartinus I see ur point, I ran two instances of the program with two differet JVM; Re-created these large files before I ran them each time. The result turned out to be the same. – Stochastika Oct 07 '14 at 21:31

1 Answers1

1

The most likely reason for the difference likely has to do with caching in the operating system, rather than in the Java virtual machine.

Specifically, sequential load of the files likely takes full advantage of OS disk controller caching and read ahead. Concurrent loading of multiple files may invalidate the disk cache each time there is a context switch between two threads reading different files, which could be very inefficient depending on the exact parameters involved such as disk cache size, buffer size in the VM, etc.

Note that this result is system dependent; on a different machine with a different disk controller and a different disk configuration or RAID configuration, you might find the opposite result, with concurrent file reading being more efficient.

Finally, I would note that it's probably not a good idea to do performance testing from within Eclipse, unless you are actually going to run your code from within Eclipse in production.

Warren Dew
  • 8,790
  • 3
  • 30
  • 44
  • that's very informative Warren, and upvoted ur answer. One thing though I wonder about is that if we want to use BufferedReader and Executors.newFixedThreadPool() to read and process files in parallel (which I assume is a quite reasonable and common thing to do?), I assume that it can not be much more simplified than my code sample? And if that is the case and my machine is a typical Windows 7 machine with NTFS drive, doesn't it mean that this can't be done in parallel more efficiently than in serial (when I used more files, up to 12, to run, Parallel took 1000s more than Serial)? – Stochastika Oct 07 '14 at 14:16
  • 1
    I agree that your test does seem to indicate that parallelization is not likely to make things go more efficiently, when running in Eclipse on your machine. This is not unexpected; keep in mind that parallelization helps mostly in cases where you have multiple processors and a processor intensive tasks that can, through parallelization, take advantage of more than one processor. It looks like your task is I/O intensive, not processor intensive, so adding threads does not help, and actually hurts a bit. – Warren Dew Oct 07 '14 at 17:26
  • 1
    I also did experiements with a Runnable jar from cmd line, the result is quite consistent with the one from eclipse. I'm more and more inclining to agree that the I/O is the culprit as you pointed out. But I wonder why I can't find much reference about this 'obvious' I/O inefficiency in conconcurent process, at least not anything from recent time? – Stochastika Oct 07 '14 at 21:17
  • The 3 papers that I managed to dig up regarding to JAVA I/O inefficiency are more than 10 years old, I believe: [ANL Paper 1 - 2000](http://www.mcs.anl.gov/~thakur/papers/javaio.pdf), [ANL Paper 2 - 2001](http://www.mcs.anl.gov/~thakur/papers/javaio-journal.pdf), [Berkley Paper 1 - 2000](http://www.cs.berkeley.edu/~bonachea/ti/bonachea-bulkIO.pdf) – Stochastika Oct 07 '14 at 21:19
  • OR did I actually fail to notice any implicitly shared varible in this simple program that may be the real culprit of thread contention? – Stochastika Oct 07 '14 at 21:27
  • @Stochastika FWIW I ran some benchmarks on Linux some time ago, and could get a speed up of something like factor two for parallel processing. The papers are too old, however, why don't you use something newer than `FileReader`? There's NIO and memory mapped files. – maaartinus Oct 08 '14 at 02:59
  • @maaartinus, u got a point there. I did consider this, but according to what I can find, for the purpose of 'reading one line at a time from a txt file', NIO is not a bit awkardly fitting. BufferedReader is still the recommended for this purpose. To generalize - I just wonder if we really need to get down to Byte based labourous file processing (including using NIO) to get a performance boost in JAVA when using concurrent I/O approach to read lines from a large file. – Stochastika Oct 08 '14 at 04:48
  • @Stochastika Maybe... I could imagine a `ByteBuffer` capable `BufferedReader` replacement in maybe 50 lines, if all you need is reading strings. Btw., I can't see why you're clearing any Java temporary files; Java does no disk caching or alike. Sometimes, the simplest way to clean a cache is to overwhelm it. If you HD is much bigger than your RAM, then do a huge read. This way you can get repeatable results without any manual interactions in between (no need to start a new process, I'd bet that measuring with interleaving both approaches in a loop would do). – maaartinus Oct 08 '14 at 05:23