2

I have a piece of code that must run extremely fast in terms of clock speed. The algorithm is already in O(N). It takes 2seconds, it needs to take 1s. For most A.length inputs ~ 100,000 it takes .3s unless a particular line of code is invoked an extreme number of times. (For an esoteric programming challenge)

It uses a calculation of the arithmetic series that 1,2,..N -> 1,3,4,10,15.. that can be represented by n*(n+1)/2 I loop through this equation hundreds of thousands of times. I do not have access to the input, nor can I display it. The only information I am able to get returned is the time it took to run. particularly the equation is:

s+=(n+c)-((n*(n+1))/2);

s and c can have values range from 0 to 1Billion

n can range 0 to 100,000

What is the most efficient way to write this statement in terms of clock speed? I have heard division takes more time then multiplication, but beyond that I could not determine whether writing this in one line or multiple assignment lines was more efficient. Dividing and multiplying versus multiplying and then dividing? Also would creating custom integers types significantly help?

Edit as per request, full code with small input case (sorry if it's ugly, I've just kept stripping it down):

public static void main(String[] args) {

        int A[]={3,4,8,5,1,4,6,8,7,2,2,4};//output 44
        int K=6;
        //long start = System.currentTimeMillis();;
        //for(int i=0;i<100000;i++){
            System.out.println(mezmeriz4r(A,K));
        //}
        //long end = System.currentTimeMillis();;

//      System.out.println((end - start) + " ms");

    }
    public static int mezmeriz4r(int[]A,int K){
        int s=0;
        int ml=s;
        int mxl=s;
        int sz=1;
        int t=s;
        int c=sz;
        int lol=50000;
        int end=A.length;
        for(int i=sz;i<end;i++){
            if(A[i]>A[mxl]){
                mxl=i;
            }else if(A[i]<A[ml]){
                ml=i;
            }
            if(Math.abs(A[ml]-A[mxl])<=K){
                sz++;
                if(sz>=lol)return 1000000000;
                if(sz>1){
                    c+=sz;
                }
            }else{
                if(A[ml]!=A[i]){
                    t=i-ml;
                    s+=(t+c)-((t*(t+1))/(short)2);
                    i=ml;
                    ml++;
                    mxl=ml;
                }else{
                    t=i-mxl;
                    s+=(t+c)-((t*(t+1))/(short)2);
                    i=mxl;
                    mxl++;
                    ml=mxl;
                }
                c=1;
                sz=0;
            }
        }
        if(s>1000000000)return 1000000000;
        return s+c;
    }

Returned from Challenge:

Detected time complexity:

O(N)

test time result

example example test 0.290 s. OK

single single element 0.290 s. OK

double two elements 0.290 s. OK

small_functional small functional tests 0.280 s. OK

small_random small random sequences length = ~100 0.300 s. OK

small_random2 small random sequences length = ~100 0.300 s. OK

medium_random chaotic medium sequences length = ~3,000 0.290 s. OK

large_range large range test, length = ~100,000 2.200 s. TIMEOUT ERROR running time: >2.20 sec., time limit: 1.02 sec.

large_random random large sequences length = ~100,000 0.310 s. OK

large_answer test with large answer 0.320 s. OK

large_extreme all maximal value = ~100,000 0.340 s. OK

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
woodlumhoodlum
  • 462
  • 3
  • 10
  • 24
  • 1
    Post your code, Then I can try to improve it – AlexWien Feb 06 '14 at 17:40
  • 3
    Since you know that `n>0` you can replace `/2` by `>>1`. Everything else is marginal compared to that. – Holger Feb 06 '14 at 17:43
  • what type are n, c and s? Are they primitives or wrapper objects? And instead of /2 you can do bitwise shift >> 1. EDIT just saw the above comment. Credits for shift goes to @Holger :) – klor Feb 06 '14 at 17:45
  • int currently, ok I'll try the bitwise shift – woodlumhoodlum Feb 06 '14 at 17:45
  • 2
    For a real measurement, use a warmup: Start the loop twice, first time without measuring time. This gives the JVM time to compile your method properly. – maaartinus Feb 06 '14 at 17:50
  • @maaartinus This is not enough! – Display Name Feb 06 '14 at 17:56
  • I would use real live data for the test not syntetic data which may not be distributed the same way as the real world data later. – MrSmith42 Feb 06 '14 at 17:58
  • 1
    @SargeBorsch: Yes, not enough. I should have recommended a benchmarking framework, but for something as complicated as this and taking one second a trivial warmup could suffice. In case you mean the repetition count, yes, I'd let it for a minute or two if I was interested in exact values. But again, this might suffice. – maaartinus Feb 06 '14 at 18:08
  • The bitwise shift led to no significant improvement. I might need to rethink my algorithm... I must be going too far backwards when in the array when I fail the math.abs k statement, causing too many iterations. I added the challenge output to the question. – woodlumhoodlum Feb 06 '14 at 18:33
  • The bitwise shift is something the JVM does itself, if it can prove that the dividend is non-negative (otherwise a branch is needed). I guess, only you can improve the algorithm itself, as I have no clue what's all about. – maaartinus Feb 06 '14 at 19:27
  • @woodlumhoodlum yours is not O(N) . Since the "i" changes dynamically – Mani Feb 06 '14 at 20:27
  • right it becomes much greater than O(N) for the test case that is giving me timeout. As far as the program that's evaluating me is concerned, it has been tricked to believe it's O(N). I've been trying to figure out how to subtract the correct amount so it doesn't need to recurse over previously visited nodes. – woodlumhoodlum Feb 06 '14 at 21:43
  • 1
    @woodlumhoodlum you dont require Math.abs if change the order . mean instead of if(Math.abs(A[ml]-A[mxl])<=K) to if((A[mxl]-A[ml])<=K) becuase mxl is maximum value index and ml is min value index. – Mani Feb 07 '14 at 16:22
  • ok, right that is an improvement. But I still need to rewrite it to not reset i to make the necessary improvement for it to be under time. Thus I've voted to close this. – woodlumhoodlum Feb 07 '14 at 16:26
  • 1
    Again one more suggestion. currently i am working on that mean while you can try " when you receive n array . you have already n slice for sure . so you can add that before calculation and modify your formula accordingly". and that will help in larger array. – Mani Feb 07 '14 at 16:35
  • That is absolutely correct, and that is the way I originally coded it. I have gone back and forth between the two methods. Although I have yet to have a functioning algorithm starting with n slices. Rather than using n*(n+1)/2 it requires me to add/subtract the current slice size to the stack. The bookkeeping got very complicated. – woodlumhoodlum Feb 07 '14 at 16:45
  • just an idea. they have provided an hint as "Elements of input arrays can be modified" try to use it along with adding n at first ( then your formula would be n ( n-1) /2 .. ) . as i said before i am working on .. if i got any will update – Mani Feb 07 '14 at 16:53

7 Answers7

3

With a little algebra, you can simply the expression (n+c)-((n*(n+1))/2) to c-((n*(n-1))/2) to remove an addition operation. Then you can replace the division by 2 with a bit-shift to the right by 1, which is faster than division. Try replacing

s+=(n+c)-((n*(n+1))/2);

with

s+=c-((n*(n-1))>>1);
rgettman
  • 176,041
  • 30
  • 275
  • 357
  • Most optimizers will change the division for you. – David Ehrmann Feb 06 '14 at 18:00
  • @David Ehrmann: This optimization is only valid for positive values. Therefore the compiler cannot make this optimization. Try it yourself in your browser: (applet to test tis) [ http://rinneberg.de/programming/opti.htm#shift_statt_div_english ] – MrSmith42 Feb 06 '14 at 18:06
  • @DavidEhrmann According to [Does Java optimize division by powers of two to bitshifting?](http://stackoverflow.com/questions/18560844/does-java-optimize-division-by-powers-of-two-to-bitshifting), that optimization doesn't occur in Java. – rgettman Feb 06 '14 at 18:08
  • @MrSmith42: The compiler `javac` can't do it, but the JVM can. If it can't prove the dividend to be non-negative, it uses a shift and a conditional fix. Here, the dividend is surely non-negative, but I doubt the JVM is that smart. – maaartinus Feb 07 '14 at 00:06
  • @MrSmith42: Update: Sorry, the dividend might get negative in case of overflow. – maaartinus Feb 07 '14 at 00:14
1

I would try the following and profile the code after each change to check if there is any gain in speed.


replace:

if(Math.abs(A[ml]-A[mxl])<=K)

by

int diff = A[ml]-A[mxl];
if(diff<=K && diff>=-K)

replace

/2

by

>>1

replace

ml++;
mxl=ml;

by

mxl=++ml;

Maybe avoid array access of the same element (internal boundary checks of java may take some time)

So staore at least A[i] in a local varianble.

MrSmith42
  • 9,961
  • 6
  • 38
  • 49
  • You think `ml++; mxl=ml;` produces different bytecode than `mxl=++ml;`? – Holger Feb 06 '14 at 17:57
  • @Holger: not if the compiler is smart, but you may reduce the number of accesses to the memory. I think it's worth to try it. And as I wrote you should always profile each change you make. – MrSmith42 Feb 06 '14 at 18:02
  • I give a spoiler: the byte code *is* identical. `iinc ; iload ; istore ` and there’s no way to code it differently (well, there would be when you accept far bigger code size). If you want spot smartness in there it would be because the compiler is smart enough to code your “improvement” the same way than the two statements. – Holger Feb 06 '14 at 18:09
  • either the math.abs or the mxl=++ml; broke it for a few returned test cases (which I don't have access to). The >>1 worked fine. I added the challenges output above. – woodlumhoodlum Feb 06 '14 at 18:29
  • @woodlumhoodlum: I would be interested what went wrong. Would be nice if you can narrow down the problem. – MrSmith42 Feb 06 '14 at 18:43
  • I think it might have been the math.abs, bc the elements in A can be negative numbers. but none of the other variables can be negative (which I apologize for not specifying). – woodlumhoodlum Feb 06 '14 at 18:48
  • All in all I think my problem is algorithmic rather than minor arithmetic improvements. – woodlumhoodlum Feb 06 '14 at 18:50
  • @woodlumhoodlum: negative numbers should be no problem. – MrSmith42 Feb 06 '14 at 18:51
  • you are correct, I ran it with that change and it gave the right answer. (bitwise and your diff) But it ended up running even slower for the failed test case, 4.8s: https://codility.com/cert/view/certMAF7ZT-8PB5SMXZRUPFKK69/details I will try adding in the mxl=++ml; next – woodlumhoodlum Feb 06 '14 at 18:59
  • I must have added in bad code earlier when I was making changes bc the mxl=++ml; and bitwise worked fine. It shaved off .02seconds against bitwise alone. https://codility.com/cert/view/cert6M3PPC-DEJER9V9WQM5BS6G/details – woodlumhoodlum Feb 06 '14 at 19:06
1

Get rid of the System.out.println() in the for loop :) you will be amazed how much faster your calculation will be

klor
  • 3,596
  • 2
  • 13
  • 13
  • the for loop in the main was just for testing. There is no print statements when it's actually run. I commented out the loop. By input of 100000, I meant A.length. – woodlumhoodlum Feb 06 '14 at 18:07
1

Nested assignments, i. e. instead of

t=i-ml;
s+=(t+c)-((t*(t+1))/(short)2);
i=ml;
ml++;
mxl=ml;

something like

s+=((t=i-ml)+c);
s-=((t*(t+1))/(short)2);
i=ml;
mxl=++ml;

sometimes occurs in OpenJDK sources. It mainly results in replacing *load bytecode instructions with *dups. According to my experiments, it really gives a very little speedup, but it is ultra hadrcore, I don't recommend to write such code manually.

leventov
  • 14,760
  • 11
  • 69
  • 98
  • Great improvement by .9 seconds.(although the server speed keeps constantly changing so it's hard to for sure determine), but still not quite there. RESULTS-> running time: >1.17 sec., time limit: 0.98 sec. I am going to have to rewrite my algorithm. – woodlumhoodlum Feb 06 '14 at 21:53
1

I dont have access to validate all inputs. and time range. but this one runs O(N) for sure. and have improved. run and let me know your feedback.i will provide details if necessary

public static int solution(int[]A,int K){
    int minIndex=0;
    int maxIndex=0;
    int end=A.length;
    int slize = end;
    int startIndex = 0;
    int diff = 0;
    int minMaxIndexDiff = 0;
    for(int currIndex=1;currIndex<end;currIndex++){
        if(A[currIndex]>A[maxIndex]){
            maxIndex=currIndex;
        }else if(A[currIndex]<A[minIndex]){
            minIndex=currIndex;
        }
        if( (A[maxIndex]-A[minIndex]) >K){
            minMaxIndexDiff= currIndex- startIndex;
            if (minMaxIndexDiff > 1){
                slize+= ((minMaxIndexDiff*(minMaxIndexDiff-1)) >> 1);
                if (diff > 0 ) {
                    slize = slize + (diff * minMaxIndexDiff);
                }
            }

            if (minIndex == currIndex){
                diff = currIndex - (maxIndex + 1);
            }else{
                diff = currIndex - (minIndex + 1);
            }
            if (slize > 1000000000) {
                return 1000000000;
            }
            minIndex = currIndex;
            maxIndex = currIndex;
            startIndex = currIndex;
        }
    }
    if ( (startIndex +1) == end){
        return slize;
    }
    if (slize > 1000000000) {
        return 1000000000;
    }
    minMaxIndexDiff= end- startIndex;
    if (minMaxIndexDiff > 1){
        slize+= ((minMaxIndexDiff*(minMaxIndexDiff-1)) >> 1);
        if (diff > 0 ) {
            slize = slize + (diff * minMaxIndexDiff);
        }
    }

    return slize;
}
Mani
  • 3,274
  • 2
  • 17
  • 27
  • They just removed the challenge from their website:(, I will be unable to check unless they eventually post the problem on their training site. I thought we still had a few more days... It looks like yours does not recurse, and might be fast enough. Thanks for your help atleast. They posted a new one: https://codility.com/ – woodlumhoodlum Feb 07 '14 at 18:31
  • Ok, All the best. will add comments/details while time permits. I have learned lot of optimize techniques from other answers. thanks for that. – Mani Feb 07 '14 at 18:34
  • Yours does not work for this test case, the answer should be 15: int A[]={10, 1, 7, 2, 2, 9, 3, 6, 2, 7, 3, 3}; int k=3; – woodlumhoodlum Feb 07 '14 at 18:38
0

I would try to elimnate this line if(Math.abs(A[ml]-A[mxl])<= by a faster self calculated abs version, which is inlined, not a method call!

The cast to (short) does not help, but try the right shift operator X >>1 instead x / 2

removing the System.out.println() can speed up by factor of 1000. But be carefull otherwise your whole algorithm can be removed by the VM becasue you dont use it. Old code:

for(int i=0;i<100000;i++){
            System.out.println(mezmeriz4r(A,K));
}

New code:

int dummy = 0;
    for(int i=0;i<100000;i++){
          dummy =   mezmeriz4r(A,K);
    }
//Use dummy otherwise optimisation can remove  mezmeriz4r
System.out.print("finished: " + dummy);
AlexWien
  • 28,470
  • 6
  • 53
  • 83
  • 1
    I thought that modern JVMs heavily optimize library method calls such as `Math.abs` (it's kind of dumb if they don't) – Display Name Feb 06 '14 at 17:50
  • Dont rely on JVM optimization if you want to be the fastest, even the sin() is calculated "by hand", instead of using the CPU sin(). since java 1.4 its slower (Strict.Math) – AlexWien Feb 06 '14 at 17:52
  • @SargeBorsch: I'd bet they do. But maybe something like `int diff = A[ml]-A[mxl]; if (diff <= K & -diff <= K) {...}` could be faster. – maaartinus Feb 06 '14 at 17:52
  • @AlexWien: `Math.abs(int)` surely doesn’t have to deal with NaN nor Infinity. – Holger Feb 06 '14 at 17:54
  • @AlexWien that's true. But only for floating-point types, and here everything is an integer. – Display Name Feb 06 '14 at 17:54
  • You are right, Math.abs(int) is simple, just return (a < 0) ? -a : a; – AlexWien Feb 06 '14 at 17:58
0

I would create a C version first and see, how fast it can go with "direct access to the metal". Chances are, you are trying to optimize calculation which is already optimized to the limit.

Display Name
  • 8,022
  • 3
  • 31
  • 66