0

I'm working on an algorithm and I've been optimizing it successfully until I came across this strange problem:

Array C is passed as an argument to the method.

       int lim = C.length; 
      double A[]=new double[lim];
      double B[]=new double[lim];
      for (int I = offset; I < lim; I++) {
      A[ (int) B[C[I]] ] = C[I];
      }

A, B are double arrays and C is an integer array. They are all of equal size.
I noticed that this code was running much slower than it should when I found the difference in time using System.nanoTime().

By troubleshooting, I found that if I replace C[I] with just I or a constant or any variable initialized with a value embedded in the code, it runs many times faster than when it's value is coming from a source that is not predefined in the code.

As a final test, I replaced C[I] with a random integer and got the same slow speed. What is happening? Does the JVM optimize code for predefined variables? Can I reasonably solve this problem without defining my array's contents in code (which I'm sure will solve this issue)?

If I can solve this, the optimization will produce a new sorting algorithm that will run much faster than Java 7's implementation of the QuickSort algorithm. It is already faster than it for n < 8000 or thereabouts for many kinds of input.

Currently, I have recorded it (on my machine) sorting 8 million numbers (type double) in 3.8-3.9 seconds while Java's QuickSort implementation(Arrays.sort()) does it in about 1.53 seconds on the same machine. I have tracked the slow-down to 2 incidences of this line of code in 2 for loops and I need help.

gbenroscience
  • 994
  • 2
  • 10
  • 33
  • 1
    why are you casting to int what already is an integer? – Typo Jul 23 '14 at 01:56
  • 3
    Should be that `A[(int)B[C[i]] ] = C[i]` ? – Sayakiss Jul 23 '14 at 01:57
  • 1
    how are you using a double value from Array B to index array A? – Vivek V K Jul 23 '14 at 01:59
  • It's seems strange that you'd have to cast `C` to an `int`. Can you show us the declarations of the three arrays? I'm assuming it's something like `double[] A, B; int[] C;` but it might help if we know for sure. – George Jul 23 '14 at 01:59
  • He either explained it wrongly or posted the wrong code. Sir, why are you trying to implement a better sorting algorithm? – Joe Coder Jul 23 '14 at 02:01
  • @Joe Coder, primarily as a matter of interest, secondarily for a research work I'm doing. – gbenroscience Jul 23 '14 at 02:05
  • 1
    @gbenroscience I am investing my time to learn Lambda in Java 8. I have use sort and sort with comparable in small samples, and Lambda gives you a channce to do what you want to do in parallel mode as well. I throught this just to give you some idea hope it helps – Kick Buttowski Jul 23 '14 at 02:08
  • Referencing a value on an array implies searching that value on the array, that will always be slower than get the value directly – Typo Jul 23 '14 at 02:12
  • Perhaps so, @boolean, but that doesn't account for the speed discrepancy I've observed. I'm talking about a difference of up to 160 milliseconds...instead of 20 ms, it ran in 180 ms – gbenroscience Jul 23 '14 at 02:14
  • @gbenroscience how many values C has stored? – Typo Jul 23 '14 at 02:17
  • 1
    Thanks @Kick! I'll consider your advice – gbenroscience Jul 23 '14 at 02:20
  • @boolean...its user defined. My last test case had initialized the arrays to a size of one million each – gbenroscience Jul 23 '14 at 03:05
  • If you think you can write a better sorting algorithm than the Java developers, you are probably mistaken. If your timing measurements suggest you have done so, your timing measurements are probably wrong. – Raedwald Jul 23 '14 at 06:58
  • See also http://stackoverflow.com/questions/504103/how-do-i-write-a-correct-micro-benchmark-in-java – Raedwald Jul 23 '14 at 06:59
  • Okay @Raedwald, maybe researchers all over the world can down their tools and stop looking for better methods and ways to do things Stuff can be improved, I've seen it happen. Anyway, I love what I do. It helps my analytical skills and proficiency with real problems. – gbenroscience Jul 25 '14 at 14:36
  • @Raedwald, Thanks for the link. Coincidentally, I had discovered some of those suggestions/tips on my own a long time ago, and I've been using them in other tests that I've run. And things can always be improved, that's what I live by. – gbenroscience Jul 25 '14 at 14:43

1 Answers1

3

I suspect that the slowdown is being caused by memory locality issues.

When you have really large arrays, they will span multiple (virtual memory) pages. So if you attempt to access array elements in a non-sequential fashion, you may end up accessing memory in a way that negates the effects of the "smarts" that the hardware uses to make memory access appear fast:

  • If you access lots of different locations, they don't fit into the caches anymore. That results in cache misses, and the CPU has to fetch from physical memory, which slows things down.

  • The non-sequential nature of the access means that wide memory fetches don't help performance.

  • When the locations are scattered over a large number pages, you are likely to get misses in the TLB (translation look-aside buffer) hardware ... which cache the vm page entries. A TLB miss will typically leaded to more memory accesses to read page table entries from memory.


It is hard to say whether this is really what it causing your algorithm to slow down. (Though I think that the results of your experimentation support this proposition ...)

If this is the underlying problem, then there is not much you can do about it. It is inherent in the way modern computers work. If your algorithm requires "random" access over a large enough array, the performance will be limited by the speed of physical memory.

Stephen C
  • 698,415
  • 94
  • 811
  • 1,216
  • can 3 arrays of size 1 million cause that slowdown you described? I guess it depends on my machine's hardware capabilities also? – gbenroscience Jul 23 '14 at 02:34
  • Anyway that can't be the issue, because for large arrays(1 million and above), when I replace the array entry with the loop variable i, or some other constant variable, the speed increases vastly. I suspect some compiler issues – gbenroscience Jul 23 '14 at 02:56
  • 1
    Replacing the array entry with the loop variable **changes the pattern of memory accesses** to increase the locality. That's why I said that your experimentation **supports** the proposition. Increased locality in memory access patterns can make your code run faster! – Stephen C Jul 23 '14 at 03:11
  • Okay. But is that supposed to happen for large arrays only or all arrays? – gbenroscience Jul 23 '14 at 03:15
  • Thanks @Stephen C. I've accepted your answer. Even if I didn't succeed in optimizing that line, I've learnt about that proposition now and I've seen how I can and have tested it, howbeit unconsciously. That's a thing or two to gain! Thanks again. – gbenroscience Jul 23 '14 at 03:24
  • And, trust me, its the real culprit that is slowing down my algorithm. It accounts for 80% of the total time taken to run the code. – gbenroscience Jul 23 '14 at 03:31
  • 1
    @gbenroscience - The larger the array, the more pronounced the effect. (For a small-enough array, the locations will all fit into the memory cache and the TLB, and you won't get many "misses".) – Stephen C Jul 23 '14 at 03:44
  • Its been a few days since I asked the question, but this came to my mind again. Would this effect be observed if I was performing the operation on the same array instead of 2 different arrays? Something like. A[i] = A[j]; where. i is very much greater than j. I can't test it right now – gbenroscience Jul 25 '14 at 14:25
  • 1
    It depends. If you are simply combining the 2 arrays into a single array that is twice the size, then it is unlikely to help. The issue is the locality characteristics of the sequence of memory operations. It doesn't matter if it is one array or two arrays. – Stephen C Jul 26 '14 at 01:21
  • So swapping is expensive even in the same array. Thanks. – gbenroscience Jul 27 '14 at 01:45