2

I have implemented two methods, shuffleList and shuffleArray which use the exact same functionality. And I have an ArrayList of half millions integer, and an array of the same half million ints. In my benchmarking code, which performs each one of the methods a 100 times on the corresponding array or ArrayList and records the time, it looks like shuffleArray takes around 0.5 seconds while shuffleList takes around 3.5 seconds, even though the code does not use any ArrayList methods but get and set, which are supposed to work as fast as they work in an array.

Now I know that ArrayLists are a little bit slower because they internally use arrays but with some additional code, but does it make this big of a difference?

     void shuffleList(List<Integer> list){
        Random rnd = ThreadLocalRandom.current();
        for(int i=list.size()-1;i>0;i--){
            int index=rnd.nextInt(i+1);
            int a=list.get(index);
            list.set(index,list.get(i));
            list.set(i,a);
        }
    }

    void shuffleArray(int[] ar)
    {
        Random rnd = ThreadLocalRandom.current();
        for (int i = ar.length - 1; i > 0; i--)
        {
            int index = rnd.nextInt(i + 1);
            int a = ar[index];
            ar[index] = ar[i];
            ar[i] = a;
        }
    }

Benchmarking code:

import org.openjdk.jmh.Main;
import org.openjdk.jmh.annotations.*;

@BenchmarkMode(Mode.AverageTime)
public class MyBenchmark {


    @Benchmark
    @Fork(value = 1)
    @Warmup(iterations = 3)
    @Measurement(iterations = 10)
    public void compete() {
        try {
            Sorting sorting = new Sorting();
            sorting.load();
            System.out.println(sorting.test());
        } catch (Exception e) {
            e.printStackTrace();
        }
    }

    public static void main(String[] args) throws Exception {
        Main.main(args);
    }
}


    protected List<Integer> list = new ArrayList<Integer>();
    protected List<int[]> arrays= new ArrayList<>();

    protected void load(){
        try (Stream<String> stream = Files.lines(Paths.get("numbers.txt"))) {
            stream.forEach(x -> list.add(Integer.parseInt(x)));
        } catch (IOException e) {
            e.printStackTrace();
        }
        finally{
            int[] arr =new int[list.size()];
            for(int i=0;i<list.size();i++)
                arr[i]=list.get(i);
            arrays.add(arr);
        }
    }

    protected double test(){
        int arr[]=arrays.get(0);
        Stopwatch watch = new Stopwatch();
        for (int i=0; i<100; i++){
          shuffleArray(arr);
          shuffleList(list);
        }
        return watch.elapsedTime();
    }

I comment out one of the methods on the for loop and use the other.

Update:

I did what a lot of you suggested of changing Int a to Integer a in the shuffleList method, and it is making it a little bit faster, it is 3 seconds instead of 3.5 now, but I still think it is a big difference.

It is worth mentioning that changing int[] arr to Integer[] arr in the shuffleArray method with keeping int a as it is to simulate the boxing and unboxing time for the array does actually make it a lot slower, it makes it take around 3 seconds, so I can make the array as slow as the ArrayList but I can not do the opposite.

Update:

Using Collections.swap() in shuffleList did indeed make it as fast as the array, but I still do not understand why, is my benchmarking too sensetive or does it really matter?

Final shuffleList code, courtesy of Andy Turner and Joop Eggen:

    protected void shuffleList(List<Integer> list){
        Random rnd = ThreadLocalRandom.current();
        for(int i=list.size()-1;i>0;i--){
            int index=rnd.nextInt(i+1);
            Collections.swap(list, i, index);
        }
    }

  • 3
    (At least part of) the difference is because you're unboxing and re-boxing the values in the `List` case. Change `int a` to `Integer` a; or use `Integer[] ar` instead of `int[] ar`. – Andy Turner Nov 10 '20 at 09:48
  • 2
    @Michael here's the obligatory link: [How do I write a correct micro-benchmark in Java?](https://stackoverflow.com/q/504103/3788176). – Andy Turner Nov 10 '20 at 09:58
  • @AndyTurner I believe you are right, changing `Int[] ar` to `Integer[] ar` made the array a lot slower, it took around 3 seconds, and changing `int a` to `Integer a` in the same `shuffleArray` method fixed it back to 0.5. But weirdly changing `int a` to `Integer a` in the `shuffleList` method only made it a little bit faster, it is 3 seconds now. – Mohammed Abdalrazaq Nov 10 '20 at 10:05
  • 1
    @Michael I edited my question to show my benchmarking code. – Mohammed Abdalrazaq Nov 10 '20 at 10:18

2 Answers2

2

Use Integer a, which save one unboxing and one boxing operation.

    for (int i = list.size()-1; i>0; i--){
        int index=rnd.nextInt(i+1);
        Integer a=list.get(index);
        list.set(index,list.get(i));
        list.set(i,a);
    }

And the Integer objects use more memory.


@Andy Turner mentioned the exist Collections#swap.

    for (int i = list.size()-1; i > 0; i--) {
        int index = rnd.nextInt(i+1);
        Collections.swap(list, i, index);
    }

Without warm-up of JIT compiler this might slow-down the bench-mark, but will look better in production code. Though then you would probably use the Collections.shuffle anyway.


As commented the swap version is fast too. First the OP showed using the right microbenchmarking code.

swap uses the original Integer class too. It does l.set(i, l.set(j, l.get(i))); in order to swap - as set returns the previous element at that position. The JIT compiler can probably unwrap set and utilize that previous element immediately.

Joop Eggen
  • 107,315
  • 7
  • 83
  • 138
  • 4
    Or [`Collections.swap`](https://docs.oracle.com/javase/8/docs/api/java/util/Collections.html#swap-java.util.List-int-int-). – Andy Turner Nov 10 '20 at 10:02
  • 1
    Collections.swap did make it a lot faster, as fast as the array, but I still do not understand why. – Mohammed Abdalrazaq Nov 10 '20 at 10:34
  • swap uses the original Integer class too. It does `l.set(i, l.set(j, l.get(i)));` as `set` returns the previous element at that position. The JIT compiler can probably unwrap `set` and utilize that previous element immediately. – Joop Eggen Nov 10 '20 at 10:56
0

There is a Java function to do the job:

Collections.shuffle( list );

This should be significantly faster than a for loop.

Kaplan
  • 2,572
  • 13
  • 14
  • It is not, it is slower, and also it does not work on primitives. – Mohammed Abdalrazaq Nov 10 '20 at 10:41
  • Either ways; I am not searching for a way to shuffle, I just wanted to analyze further and figure out why something behaved the way it did. – Mohammed Abdalrazaq Nov 10 '20 at 10:42
  • Slower really? Did You test it? You should do a peek into the sources of `Collections` how the Java guys handled that `shuffle` thing. – Kaplan Nov 10 '20 at 19:46
  • I'm no benchmark-expert, but I think the checked solution needs twice the time of calling `Collections.shuffle( list )` – the time for filling the `list` not taking into account. – Kaplan Nov 10 '20 at 20:38