12

I performed a short benchmark on a long array in java with quite strange results. It seems that sequential reads with random writes are faster - half the time - than random reads with sequential writes. Has anyone a clue why??

Here are two methods that write an array of some longs (run with -Xmx2G or so) by random when reading sequentially and read sequentially when writing by random:

import java.util.Random;


public class Scratch {
static Random random = new Random();
static long[] arr = new long[100000000];

static void seqReadRandWrite() {
    for(int i=0;i<arr.length;i++) {
        int at = random.nextInt(arr.length);
        arr[at] = arr[i];
    }
}

static void seqWriteRandRead() {
    for(int i=0;i<arr.length;i++) {
        int at = random.nextInt(arr.length);
        arr[i] = arr[at];
    }
}

public static void main(String[] args) throws Exception {

    seqWriteRandRead(); // warm up

    long nanos = System.nanoTime();
    seqReadRandWrite();
    System.out.println("Time: " + (System.nanoTime()-nanos) + "ns");

    nanos = System.nanoTime();
    seqWriteRandRead();
    System.out.println("Time: " + (System.nanoTime()-nanos) + "ns");

}
}

the results on my notebook are

Time: 2774662168ns

Time: 6059499068ns

Which means its twice as fast to write by random compared to read .. or? Is my notebook broken?

ps.: this does not claim to be a benchmark, although most of the points in the linked advices about benchmarking are covered. Even if I run the already 200,000,000 operations multiple times, the resuts stay quite constant. It seems (seems!) that moving memory from random positions into sequential blocks is slower than moving memory from sequential positions into random blocks at least with memory of this size and the above way of doing it. and I wonder why?

Cœur
  • 37,241
  • 25
  • 195
  • 267
cybye
  • 1,171
  • 6
  • 13
  • 4
    Why didn't you warm up the `seqReadRandWrite()` method also? Are you sure that a single call is enough to invoke the JIT compiler? – maerics Jan 31 '13 at 22:09
  • thats just for the memory .. try it, no difference – cybye Jan 31 '13 at 22:15
  • I removed the warm up, ran both tests in loop, reversed the order. `seqReadRandWrite` is consistently about twice as fast as `seqWriteRandRead`. – Miserable Variable Jan 31 '13 at 22:18
  • Use random.nextInt(arr.length) instead of Math.abs since that can return a negative number for Integer.MIN_VALUE – jdb Jan 31 '13 at 22:22
  • I suspect this has something to do with the relative expense of cache read-misses vs write misses. It's more dramatic a difference than I would have expected though. +1 for a very interesting question. – Perception Jan 31 '13 at 22:24
  • @jdb, I got a array index exception, I changed it to `Math.abs(random.nextInt() % arr.length)` but `random.nextInt(arr.length)` is better – Miserable Variable Jan 31 '13 at 22:27
  • @jdb changed, looks nicer, but .. – cybye Jan 31 '13 at 22:27
  • do more samples. check the variance of your samples. if you want to get scientific about it, do many samples of each method (5, 10, 100). print out stats on each method (min/max/std-dev) – Tom Carchrae Jan 31 '13 at 22:27
  • 1
    My WAG would be that reading sequentially is optimized as this is done often in programming. That is why the first algorithm is faster. – JustinKSU Jan 31 '13 at 22:28
  • @TomCarchrae it's quite constant.. – cybye Jan 31 '13 at 22:28
  • Did you try separating the reads and writes? That is, compare random to sequential each of read and write separately – Miserable Variable Jan 31 '13 at 22:33
  • 2
    @MiserableVariable - I just did that very test out of curiosity - sequential reads/writes run at pretty much the same speed (as expected). But random reads seem to be ~ twice as slow as random writes. I do believe this is the effects of memory caches at play. – Perception Jan 31 '13 at 22:38
  • Yup, I got the same results. Are your results similar to mine in the answer below? – Miserable Variable Jan 31 '13 at 22:42
  • 2
    'Short benchmark' => invalid benchmark. – user207421 Jan 31 '13 at 23:42
  • Does your processor have a write-back cache? If so, then to service a random write, it may only need to record the value in the cache and schedule a write to main memory for later, whereas to service a read, it needs to pull the data in from main memory. Although in practice, i would expect that write to need to do a read to fill a cache line. – Tom Anderson Feb 01 '13 at 00:09
  • As has been said, your test isn't really testing Java code at all, but the cache behavior of your CPU... results on my computer are pretty close for reads and writes. – vanza Feb 01 '13 at 03:23

6 Answers6

3

Your benchmark is producing numbers that fail the "Do they make sense?" test. In such a situation, you should always double / triple / quadruple check your methodology ... BEFORE treat the numbers as a true reflection of reality.

Writing reliable benchmarks is hard. And in the case of Java, it is particularly hard, because some aspects of the Java platform can introduce systematic distortions into your benchmark measurements ... unless you specifically allow / compensate for them.

But the "check your methodology" rule applies to ALL experiments ... especially ones that produce results that don't seem to make sense. (Like neutrinos travelling faster than light ...)


The other thing to note is that once you've rewritten the benchmark to account for the confounding factors, you may still see unexpected numbers. The issue here is that performance of benchmarks like this are likely to be sensitive to things like the size of L1 and L2 caches, size of cache lines, relative speeds of the different levels of memory ... and their interplay with the exact sequences of instructions that the benchmark produce in the tight loops.

These things are complicated, difficult to analyse, and can give counterintuitive behaviour. And it is not surprising (to me) that different machines give different measured performance.

So even if the figures are real, it is still unsafe to draw any general conclusions about the speed of reads versus writes from this benchmark. Not even if you restrict them to just your laptop.

Stephen C
  • 698,415
  • 94
  • 811
  • 1,216
  • You're absolutely right, but that I did not draw any conclusion and instead ended up in some questions. – cybye Feb 01 '13 at 05:54
  • 1
    Yea .. but @Miserable Variable does appear to be drawing conclusions, and dubious ones at that!! – Stephen C Feb 01 '13 at 06:14
1

In summary, the question title is slightly incorrect. Truth seems to be that on some environments (e.g mine and OP's) random array writes are faster then random array reads. But note that this is not true for some other people.

Based on @JustinKSU's comment I separated out read and write and found that random writes are faster then random reads. The results are as follows. This seems to be the reason, and the collective opinion here seems to be that the read-misses on cache are more expensive then write-misses (if there is any caching involved in writes at all).

In production though where there is other activity, hotspot might play a role.

/cygdrive/c/Java/jdk1.7.0/bin/javac.exe Scratch.java && /cygdrive/c/Java/jdk1.7.0/bin/java Scratch
Starting
seqRead: 1273719725ns
seqRead: 1243055271ns
seqRead: 1245022497ns
seqRead: 1242868527ns
seqRead: 1241655611ns
randRead: 6900959912ns
randRead: 6965196004ns
randRead: 7379623094ns
randRead: 7020390995ns
randRead: 6938997617ns
seqWrite: 1266963940ns
seqWrite: 1250599487ns
seqWrite: 1246471685ns
seqWrite: 1230472648ns
seqWrite: 1246975416ns
randWrite: 3898382192ns
randWrite: 3897441137ns
randWrite: 3939947844ns
randWrite: 4207906037ns
randWrite: 4103594207ns

Compilation finished at Thu Jan 31 14:38:57

My modified code is as follows:

import java.util.Random;


public class Scratch {
static Random random = new Random();
static long[] arr = new long[100000000];

static void seqReadRandWrite() {
    for(int i=0;i<arr.length;i++) {
        int at = Math.abs(random.nextInt() % arr.length);
        arr[at] = arr[i];
    }
}

static void seqWriteRandRead() {
    for(int i=0;i<arr.length;i++) {
        int at = Math.abs(random.nextInt() % arr.length);
        arr[i] = arr[at];
    }
}


static void seqRead() {
    int x = 0;
    for(int i=0;i<arr.length;i++) {
        int at = Math.abs(random.nextInt() % arr.length);
        x += arr[i];
    }
}

static void randRead() {
    int x = 0;
    for(int i=0;i<arr.length;i++) {
        int at = Math.abs(random.nextInt() % arr.length);
        x += arr[at];
    }
}

static void seqWrite() {
    for(int i=0;i<arr.length;i++) {
        int at = Math.abs(random.nextInt() % arr.length);
        arr[i] = at;
    }
}

static void randWrite() {
    for(int i=0;i<arr.length;i++) {
        int at = Math.abs(random.nextInt() % arr.length);
        arr[at] = at;
    }
}


public static void main(String[] args) throws Exception {

    // seqWriteRandRead(); // warm up
    System.out.println("Starting");

    long nanos =  -1;
    /*
    for (int i = 0; i < 5; i++) {       
        nanos = System.nanoTime();
        seqWriteRandRead();
        System.out.println("WriteRandRead Time: " + (System.nanoTime()-nanos) + "ns");

        nanos = System.nanoTime();
        seqReadRandWrite();
        System.out.println("ReadRandWrite Time: " + (System.nanoTime()-nanos) + "ns");
    }
    */

    for (int i = 0; i < 5; i++) {       
        nanos = System.nanoTime();
        seqRead();
        System.out.println("seqRead: " + (System.nanoTime()-nanos) + "ns");
    }

    for (int i = 0; i < 5; i++) {       
        nanos = System.nanoTime();
        randRead();
        System.out.println("randRead: " + (System.nanoTime()-nanos) + "ns");
    }


    for (int i = 0; i < 5; i++) {       
        nanos = System.nanoTime();
        seqWrite();
        System.out.println("seqWrite: " + (System.nanoTime()-nanos) + "ns");
    }

    for (int i = 0; i < 5; i++) {       
        nanos = System.nanoTime();
        randWrite();
        System.out.println("randWrite: " + (System.nanoTime()-nanos) + "ns");
    }

}
}

UPDATE

@tomcarchrae did the same test on Linux, with significantly different results. Below, the first column is numbers from my test and the second are from Tom's:

seqRead:   1273719725ns   2810487542ns  
seqRead:   1243055271ns   2780504580ns  
seqRead:   1245022497ns   2746663894ns  
seqRead:   1242868527ns   2746094469ns  
seqRead:   1241655611ns   2763107970ns  
randRead:  6900959912ns   23093543703ns 
randRead:  6965196004ns   22458781637ns 
randRead:  7379623094ns   24421031646ns 
randRead:  7020390995ns   25880250599ns 
randRead:  6938997617ns   26873823898ns 
seqWrite:  1266963940ns   4226886722ns  
seqWrite:  1250599487ns   4537680602ns  
seqWrite:  1246471685ns   3880372295ns  
seqWrite:  1230472648ns   4160499114ns  
seqWrite:  1246975416ns   4008607447ns  
randWrite: 3898382192ns   25985349107ns 
randWrite: 3897441137ns   22259835568ns 
randWrite: 3939947844ns   22556465742ns 
randWrite: 4207906037ns   22143959163ns 
randWrite: 4103594207ns   21737397817ns 
Community
  • 1
  • 1
Miserable Variable
  • 28,432
  • 15
  • 72
  • 133
  • Chuckle, you beat me to posting this by about 15 seconds. I am pretty sure these results are due to read-misses on the main cache being more expensive than write misses. Add that to your answer and its +1 for you. – Perception Jan 31 '13 at 22:44
  • I edited my answer. Worth the internet points, I mean knowledge :) – Miserable Variable Jan 31 '13 at 22:51
  • Seems to be hardware-dependent. Or maybe JVM-dependent. I could reproduce too (with the newest Java 7), but some couldn't. – lbalazscs Jan 31 '13 at 23:13
  • Results from my machine (Linux) Starting seqRead: 2810487542ns seqRead: 2780504580ns seqRead: 2746663894ns seqRead: 2746094469ns seqRead: 2763107970ns randRead: 23093543703ns randRead: 22458781637ns randRead: 24421031646ns randRead: 25880250599ns randRead: 26873823898ns seqWrite: 4226886722ns seqWrite: 4537680602ns seqWrite: 3880372295ns seqWrite: 4160499114ns seqWrite: 4008607447ns randWrite: 25985349107ns randWrite: 22259835568ns randWrite: 22556465742ns randWrite: 22143959163ns randWrite: 21737397817ns – Tom Carchrae Jan 31 '13 at 23:30
  • I am going to add this to my answer for comparison. – Miserable Variable Jan 31 '13 at 23:33
  • @TomCarchrae wow seq writes are actually slower then random for you. – Miserable Variable Jan 31 '13 at 23:43
  • i'd be careful reading much into these numbers. you should really be analyzing for min/max/mean/std-dev etc. you also need some warm up code, else there is noise from JIT optimizations. – Tom Carchrae Jan 31 '13 at 23:52
  • you should change the start of this answer to "on my machine running OS, and with JVM VERSION, and this code, random array writes are faster then random array reads" – Tom Carchrae Feb 01 '13 at 16:35
  • @TomCarchrae Few of us (including OP) got similar results so perhaps "on some machines"? – Miserable Variable Feb 01 '13 at 17:03
  • some machines is not as useful as if you stated what machines/OS it occurs on. otherwise, it becomes so vague you can't really infer anything from it. – Tom Carchrae Feb 01 '13 at 19:09
1

I believe this benchmark is totally useless for you. There are a lot of parameters of measurements to consider you did not describe and the way you approach that problem, is completely undescribed. To make any conclusion at all about speed of implementations regarding VMs, Computers, RAM speed, software you process at the same time, kind of Objects or simple stuff you copy, and so on, you must learn about a methodic way. This question is not answereable. You must narrow on which specific circumstances you want to know about speed.

Especially you cannot make any conclusion, when using Random numbers. This increases a lot the problem of best, worst or average case Complexity.

Please check out complexity on algorithms, then go on with searching how to make scientific runtime performance measurements. I hope I could help you a bit.

This first answer is awesome and will help you to understand. How do I write a correct micro-benchmark in Java?

Best regards,

Community
  • 1
  • 1
Semo
  • 783
  • 2
  • 17
  • 38
1

The answer is in previous comments and boils down to memory access patterns effects. This blog post covers the effects of random reads. Writes do not suffer similarly.

This is not a Java issue (or indeed any language issue) but a reality of the hardware you run on (and a common reality at that). This doesn't mean you should ignore it! While your initial benchmark may have been flawed it still hit a real issue for some software, so at that it's a valuable lesson.

The conclusion is not that reads are more expensive than writes though. It's that random memory access is not well catered for by the hardware. This is basically why LinkedList performance is so much worse than ArrayList for sequential access, they are both of same computational complexity but array access plays to the hardware strength where linked list doesn't.

Nitsan Wakart
  • 2,841
  • 22
  • 27
0

Your experiment is broken, not your laptop. See here for a discussion and some tools to help measure performance: Java performance timing library

Below are some results that contract yours. Also I modified your code to be more rigorous and careful in how it takes measurements.


My environment is Linux (Mint 14 which is based on Ubuntu 12.10) using the Sun JDK 1.6.0_38

With 1.5G of heap for the big example, ie -Xmx1512


Note: interesting. Could be my result is different because the array size is different below. Will re-run and update.

Nope: the result is similar in there is not much difference in the mean. But what is interesting is the difference against the short run, ie 21092.5 (/10 = 2109.2) vs 1645.2 which may be slower because of memory paging.

result with static long[] arr = new long[100000000]; (the original array size in question)

Write: DescriptiveStatistics: n: 10 min: 20893.0 max: 22190.0 mean: 21092.5 std dev: 390.90727800848117 median: 20953.5 skewness: 3.0092198852491543 kurtosis: 9.264808973899097

Read: DescriptiveStatistics: n: 10 min: 21668.0 max: 22736.0 mean: 21892.5 std dev: 318.31509546359877 median: 21766.5 skewness: 2.5034216544466124 kurtosis: 6.560838306717343


I'm not seeing a huge difference in reads vs writes. I changed the experiment to measure 10 times on a slightly smaller array (result is the same number of reads/writes). Feel free to re-run with a larger size array or sample size.

Write: DescriptiveStatistics: n: 10 min: 1584.0 max: 1799.0 mean: 1645.2 std dev: 59.51619760853156 median: 1634.5 skewness: 2.137918517160786 kurtosis: 5.764166551997385

Read: DescriptiveStatistics: n: 10 min: 1568.0 max: 2202.0 mean: 1689.0 std dev: 186.93908693000031 median: 1623.0 skewness: 2.770215113912315 kurtosis: 8.12245132320571

Here is a modified version of your code that does more samples:

import java.util.Random;

import org.apache.commons.lang.time.StopWatch;
import org.apache.commons.math.stat.descriptive.DescriptiveStatistics;

public class Test {
    static Random random = new Random();
//  static long[] arr = new long[100000000];
    static long[] arr = new long[10000000];

    static void seqReadRandWrite() {
        for (int i = 0; i < arr.length; i++) {
            int at = Math.abs(random.nextInt()) % arr.length;
            arr[at] = arr[i];
        }
    }

    static void seqWriteRandRead() {
        for (int i = 0; i < arr.length; i++) {
            int at = Math.abs(random.nextInt()) % arr.length;
            arr[i] = arr[at];
        }
    }

    public static void main(String[] args) throws Exception {

        StopWatch timer = new StopWatch();
        int count = 10;

        // warm up
        for (int i=0; i<3; i++){
            seqReadRandWrite();
        }
        DescriptiveStatistics write = new DescriptiveStatistics();
        for (int i=0; i<count; i++){
            timer.reset();
            timer.start();
            seqReadRandWrite();
            timer.stop();
            write.addValue(timer.getTime());
        }
        System.out.println("Write: " + write);

        // warm up
        for (int i=0; i<3; i++){
            seqWriteRandRead(); 
        }
        DescriptiveStatistics read = new DescriptiveStatistics();
        for (int i=0; i<count; i++){
            timer.reset();
            timer.start();
            seqWriteRandRead();
            timer.stop();
            read.addValue(timer.getTime());
        }

        System.out.println("Read: " + read);


    }
}
Community
  • 1
  • 1
Tom Carchrae
  • 6,398
  • 2
  • 37
  • 36
0

results on my PC: (ns per r/w)

seq read :     1.4 
rnd read :   10x.x   
seq write:     3.3 
rnd write:   10x.x

and seqReadRandWrite and seqWriteRandRead are equally fast at 100ns per loop.

so this may depend on hardware. also VM setting. try java -server and see if speed improves.

irreputable
  • 44,725
  • 9
  • 65
  • 93