18

My java program spends most time by reading some files and I want to optimize it, e.g., by using concurrency, prefetching, memory mapped files, or whatever.

Optimizing without benchmarking is a non-sense, so I benchmark. However, during the benchmark the whole file content gets cached in RAM, unlike in the real run. Thus the run-times of the benchmark are much smaller and most probably unrelated to the reality.

I'd need to somehow tell the OS (Linux) not to cache the file content, or better to wipe out the cache before each benchmark run. Or maybe consume most of the available RAM (32 GB), so that only a tiny fraction of the file content fits in. How to do it?

I'm using caliper for benchmarking, but in this case I don't think its necessary (it's by no means a microbenchmark) and I'm not sure it's a good idea.

maaartinus
  • 44,714
  • 32
  • 161
  • 320

2 Answers2

5

Clear the Linux file cache

sync && echo 1 > /proc/sys/vm/drop_caches

Create a large file that uses all your RAM

dd if=/dev/zero of=dummyfile bs=1024 count=LARGE_NUMBER

(don't forget to remove dummyfile when done).

Bruno Grieder
  • 28,128
  • 8
  • 69
  • 101
  • The latter would probably take quite a some time and I don't know how to exclude this time from the benchmark result. The former seems to work via a suid perl-script. – maaartinus Jul 23 '12 at 11:04
2

You can create a very large file and then delete it. This will clear the disk cache.

Another way to test the performance is to read a file(s) which is larger than your main memory.

Either way, what you are testing is the performance of your hardware. To improve this you need to improve your hardware, there is only so much you can do in software. e.g. multiple threads won't make your disks spin faster. ;)


Windows NT http://research.microsoft.com/pubs/68479/seqio.doc

When doing sequential scans, NT makes 64KB prefetch requests

From Linux http://www.ece.eng.wayne.edu/~sjiang/Tsinghua-2010/linux-readahead.pdf

Sequential prefetching, also known as readahead in Linux, is a widely deployed technique to bridge the huge gap between the characteristics of storage devices and their inefficient ways of usage by applications

Peter Lawrey
  • 525,659
  • 79
  • 751
  • 1,130
  • Yeah, this a about the time where you start putting SSDs everywhere. – Bruno Grieder Jul 23 '12 at 10:12
  • Even using a disk controller with multiple spinals can help, but with SSD your limit is capacity (or budget ;) rather than speed. – Peter Lawrey Jul 23 '12 at 10:16
  • @Peter Lawrey: Multiple threads won't make my disks spin faster, but one thread may prefetch the data and so the processing and computation may overlap. With RAID it may be a good idea to prefetch multiple files at once. There may be a optimal block size, etc... – maaartinus Jul 23 '12 at 10:20
  • Often the OS is smart enough to prefetch your data if you read it sequentially so you might not need to write this yourself (which is the fastest way to load data) – Peter Lawrey Jul 23 '12 at 10:33
  • The OS will not prefetch data from different files, so if the application is of the 'read file, process it, read next file, process it' variety, you'll benefit from concurrency. – sam Jul 26 '12 at 12:38
  • @sam Microsoft and Linux appear to disagree. Do you have any references to support this? – Peter Lawrey Jul 26 '12 at 12:46
  • @Peter Lawrey: +1 for the links! The OS can't know what file I'm gonna open next, so at least such prefetching could make sense. On my Linux I can confirm that manually prefetching when reading single-threaded brings nothing at all. Tests with concurrency are yet to be done. – maaartinus Jul 28 '12 at 18:24