2

I've been solving one algorithmic problem and found solution, as I thought. But unexpectedly I bumped into a weird problem.

Let's assume i have the following code on java 8/17(replicates on both), intel 11th gen processor:

import java.util.Arrays;
import java.util.concurrent.ThreadLocalRandom;

public class DistanceYandex{
    static class Elem implements Comparable<Elem>{
        int value;
        int index;
        long dist;
        public Elem(int value, int index){
            this.value = value;
            this.index = index;
        }

        @Override
        public int compareTo(Elem o){
            return Integer.compare(value, o.value);
        }
    }
    public static void main(String[] args){
        int n = 300_000;
        int k = 3_000;
        Elem[] elems = new Elem[n];
        for(int i = 0; i < n; i++){
            elems[i] = new Elem(ThreadLocalRandom.current().nextInt(), i);
        }
        solve(n, k, elems);
    }
    private static void solve(int n, int k, Elem[] elems){
        Arrays.sort(elems); // interesting line
        long time = System.nanoTime();

        for(int i = 0; i < n; i++){
            elems[i].dist = findDistForIth(elems, i, k);
        }
//        i omit output, because it's irrelevant
//        Arrays.sort(elems, Comparator.comparingInt(elem -> elem.index));
//        System.out.print(elems[0].dist);
//        for(int i = 1; i < n; i++){
//            System.out.print(" " + elems[i].dist);
//        }
        System.out.println((System.nanoTime() - time)/1_000_000_000.0);
    }

    private static long findDistForIth(Elem[] elems, int i, int k){
        int midElem = elems[i].value;
        int left = i - 1;
        int right = i + 1;
        long dist = 0;
        for(int j = 0; j < k; j++){
            if(left < 0){
                dist += elems[right++].value - midElem;
            }else if(right >= elems.length){
                dist += midElem - elems[left--].value;
            }else{
                int leftAdd = midElem - elems[left].value;
                int rightAdd = elems[right].value - midElem;
                if(leftAdd < rightAdd){
                    dist+=leftAdd;
                    left--;
                }else{
                    dist+=rightAdd;
                    right++;
                }
            }
        }
        return dist;
    }
}

Point your eyes at solve function.

Here we have simple solution, that calls function findDistForIth n times and measures time it takes(I don't use JMH, because testing system for my problem uses simple one-time time measures). And before it captures start time, it sorts the array by natural order using built-in Arrays.sort function.

As you could notice, measured time doesn't include the time the array gets sorted. Also function findDistForIth's behaviour does not depend on whether input array is sorted or not(it mostly goes to third else branch). But if I comment out line with Arrays.sort I get significantly faster execution: instead of roughly 7.3 seconds, it takes roughly 1.6 seconds. More that 4 times faster!

I don't understand what's going on.

I thought maybe it is gc that's messing up here, I tried to increase memory I give to jvm to 2gb(-Xmx2048M -Xms2048M). Didn't help.

I tried to pass explicit comparator to Arrays.sort as second argument(Comparator.comparingInt(e -> e.value)) and deimplementing Comparable interface on Elem class. Didn't help.

I launched the profiler(Intellij Profiler)

With Arrays.sort included: With Arrays.sort included With Arrays.sort excluded: With Arrays.sort excluded

But it didn't give me much information...

I tried building it directly to .jar and launching via java cmd(before i did it via intellij). It also didn't help.

Do anybody know what's goind on?

This problem also replicates in online compiler: https://onlinegdb.com/MPyNIknB8T

MediaNik Ok
  • 121
  • 5
  • I can reproduce locally. Without sort: 2.09 seconds. With sort: 10.28 seconds. – Ole V.V. Jun 06 '22 at 09:27
  • It's not a problem with sorting I think, sorting is pretty fast, check out the time taken for sorting in this link https://onlinegdb.com/dQzZTCUiP. – Charchit Kapoor Jun 06 '22 at 09:28
  • 1
    When sorting you are sorting the *references* to the objects. The objects themselves stay where they are in memory. For this reason when you are not sorting, you are accessing the objects in the order they were allocated, which is also the order in which they are located in memory. This is more efficient. After sorting you are accessing the objects out of memory address order, so always jumping back and forth in memory. This is less efficient. Particularly on a computer with virtual memory, where i't causes thrashing. – Ole V.V. Jun 06 '22 at 10:09
  • Thank you, @OleV.V. Is there a way to move objects in memory somehow? like calling gc to do this instead of us? – MediaNik Ok Jun 06 '22 at 10:23
  • 1
    @MediaNikOk Nope – Michael Jun 06 '22 at 10:24
  • 1
    If you really insist, you can sort the *contents* of the objects by transferring `value` and `index`from object to object during sorting. Or allocate all new objects after sorting. – Ole V.V. Jun 06 '22 at 10:30

1 Answers1

0

May be you need to sort your data using red black tree sorting algo which implemented in SortedSet, Arrays.sort use mergesort sorting algo which works well for small number of data

biiyamn
  • 476
  • 3
  • 9