1

I hava 10 files, each file is 500M.

I use Long randomValue = Math.abs(random.nextLong()); to generate those files. Every line in the file is a randomValue. And a file has about 26374000 lines.

When I read those files one by one in another program and count cost time, I find the input become more faster. Why?

read 0 file 26373542 lines time : 27046ms
read 1 file 26373627 lines time : 24155ms
read 2 file 26373676 lines time : 19227ms
read 3 file 26373768 lines time : 22875ms
read 4 file 26373681 lines time : 20813ms
read 5 file 26373774 lines time : 18297ms
read 6 file 26373787 lines time : 10556ms
read 7 file 26373557 lines time : 11614ms
read 8 file 26373566 lines time : 9751ms
read 9 file 26373653 lines time : 13372ms

This is my program:

Long start = new Date().getTime();
FileReader fr = new FileReader(inFile);
BufferedReader br = new BufferedReader(fr);
String num;
while((num = br.readLine()) != null) {
    sorted[j++] = Long.parseLong(num);
    count++;
}
Long end = new Date().getTime();
System.out.println("read " + i + " file " + j + "lines time : " + (mid - start) + "ms");
br.close();
fr.close();
He Yuntao
  • 86
  • 11

1 Answers1

0

A few things:

  1. As has been noted in the comments, the JVM does often improve performance once it has warmed up - although it's not clear from your code snippet whether you're launching a fresh JVM per file or not.
  2. You're reading files from disk. It's not clear if it's a spinning disk or an SSD - these have dramatically different performance characteristics. Nonetheless, there are a large number of variables when reading data from disk that makes it difficult to do direct benchmarks on the underlying algorithm you're testing unless you load the data into memory first and then hit start on your stopwatch.
  3. Your files are not identical (different number of lines, probably different contents, etc.)
  4. Once you've considered the above, you probably need to think about whether you have done enough repeatable tests, whether the test duration is sufficiently long, etc. before drawing too many concrete conclusions about performance.
Community
  • 1
  • 1
Catchwa
  • 5,845
  • 4
  • 31
  • 57