2

The question states: given directions (Up, Down, Left, Right) in String moves and starting from origin, will the net result be origin?

An obvious solution is to simply set two variables (for keeping track of vertical and horizontal movement), and seeing if they become zero after all direction operations:

public boolean judgeCircle(String moves) {
    char[] chars = moves.toCharArray();
    int vertical = 0;
    int horizontal = 0;

    for(int i = 0; i < chars.length; i++){
        char c = chars[i];
        switch(c){
            case 'U':
                vertical ++;
                break;
            case 'D':
                vertical --;
                break;
            case 'L': 
                horizontal --;
                break;
            case 'R': 
                horizontal ++;
                break;
        }
    }
    return (vertical == 0) && (horizontal == 0);
}

This algorithm finds the correct solution in around ~8ms for the test cases at Leetcode.

However, a solution that scans all the operations 4 times finds the solution in only ~1ms:

public boolean judgeCircle(String moves) {
 int x = charCount(moves, 'R') - charCount(moves, 'L');
    int y = charCount(moves, 'U') - charCount(moves, 'D');

    return x == 0 && y == 0; 
}

private int charCount(String moves, char c) {
    int count = 0;
    for(int i=0; i<moves.length(); i++) {
        if(moves.charAt(i) == c) {
            count++; 
        }
    }
    return count;   
}

I repeated both the solutions around 10 times, and the results were always consistent. But why is an algorithm that scans moves 4 times running faster (4 times faster!) than an algorithm that finds the answer in only one pass?

a3y3
  • 1,025
  • 2
  • 10
  • 34
  • 1
    How does the time for the first case change if you remove the time taken for the conversion from a String to a char array? – Scott Hunter Apr 02 '19 at 02:01
  • Can you please add the appropriate language tag? I suppose this is Java? – Nico Schertler Apr 02 '19 at 02:16
  • @ScottHunter It's the same. In fact I thought it would be faster because removing calls to String.charAt(i). In any case, that's not the cause for the bottleneck. – a3y3 Apr 02 '19 at 02:16
  • Language added. Interesting, I did not think this could be a java specific thing @NicoSchertler – a3y3 Apr 02 '19 at 02:17
  • 3
    There are all sorts of possible reasons. But the first one to consider is that your observation is due to flaws in your benchmarking; see https://stackoverflow.com/questions/504103/how-do-i-write-a-correct-micro-benchmark-in-java. I recommend that you convert your current benchmark into JMH benchmark and see if you still get anomalous results. If you do, post the benchmark and the results, and we can take a look at them. – Stephen C Apr 02 '19 at 02:44
  • 1
    On my machine, the first version becomes slightly faster if I drop the char array. – Nico Schertler Apr 02 '19 at 02:52
  • There are a few optimizations that could lead to faster times. First, odd sequences never return to the start; second, if counts of left-right do not match you can return 'false' without ever looking at up-down. Yes, I am aware that this does not answer the question. – tucuxi Apr 02 '19 at 10:49

1 Answers1

6

I adjusted your example so all algorithms work on the same data. I also added one more variant with in if implementation.

@State(Scope.Thread)
public class ForVsSwitch {
    private static final int MOVES_LENGTH = 1024;
    private static final char[] COMMANDS = { 'U', 'D', 'L', 'R'};

    private char[] moves;

    @Setup
    public void prepare(){
        Random random = new Random();
        moves = new char[MOVES_LENGTH];
        for(int i=0; i< MOVES_LENGTH; i++) {
            moves[i] = COMMANDS[random.nextInt(4)];
        }
    }

    @Benchmark
    @BenchmarkMode(Mode.SampleTime)
    @OutputTimeUnit(TimeUnit.MILLISECONDS)
    @Warmup(iterations = 3)
    @Measurement(iterations = 5)
    public void withSwitch() {
        judgeCircleWithSwitch(moves);
    }

    @Benchmark
    @BenchmarkMode(Mode.SampleTime)
    @OutputTimeUnit(TimeUnit.MILLISECONDS)
    @Warmup(iterations = 3)
    @Measurement(iterations = 5)
    public void withFor() {
        judgeCircleWithFor(moves);
    }

    @Benchmark
    @BenchmarkMode(Mode.SampleTime)
    @OutputTimeUnit(TimeUnit.MILLISECONDS)
    @Warmup(iterations = 3)
    @Measurement(iterations = 5)
    public void withIf() {
        judgeCircleWithIf(moves);
    }

    private boolean judgeCircleWithSwitch(char[] moves) {
        int vertical = 0;
        int horizontal = 0;

        for(int i = 0; i < moves.length; i++){
            char c = moves[i];
            switch(c){
                case 'U':
                    vertical ++;
                    break;
                case 'D':
                    vertical --;
                    break;
                case 'L':
                    horizontal --;
                    break;
                case 'R':
                    horizontal ++;
                    break;
            }
        }
        return (vertical == 0) && (horizontal == 0);
    }

    private boolean judgeCircleWithIf(char[] moves) {
        int vertical = 0;
        int horizontal = 0;

        for(int i = 0; i < moves.length; i++){
            char c = moves[i];
            if(c == 'U') {
                vertical++;
            } else if(c == 'D') {
                vertical--;
            } else if(c == 'L') {
                horizontal--;
            } else if(c == 'R') {
                horizontal ++;
            }
        }
        return (vertical == 0) && (horizontal == 0);
    }

    private boolean judgeCircleWithFor(char[] moves) {
        int x = charCount(moves, 'R') - charCount(moves, 'L');
        int y = charCount(moves, 'U') - charCount(moves, 'D');

        return x == 0 && y == 0;
    }

    private int charCount(char[] moves, char c) {
        int count = 0;
        for(int i=0; i<moves.length; i++) {
            if(moves[i] == c) {
                count++;
            }
        }
        return count;
    }
}

If I read the results correctly 99.9% of the executions are faster than 27ms to 29ms, right? There seems to be no difference between the algorithms.

Benchmark                                    Mode      Cnt   Score    Error  Units
ForVsSwitch.withFor                        sample  5680658   0,003 ±  0,001  ms/op
ForVsSwitch.withFor:withFor·p0.00          sample            0,002           ms/op
ForVsSwitch.withFor:withFor·p0.50          sample            0,003           ms/op
ForVsSwitch.withFor:withFor·p0.90          sample            0,003           ms/op
ForVsSwitch.withFor:withFor·p0.95          sample            0,004           ms/op
ForVsSwitch.withFor:withFor·p0.99          sample            0,019           ms/op
ForVsSwitch.withFor:withFor·p0.999         sample            0,029           ms/op
ForVsSwitch.withFor:withFor·p0.9999        sample            0,075           ms/op
ForVsSwitch.withFor:withFor·p1.00          sample            2,912           ms/op
ForVsSwitch.withIf                         sample  8903669   0,002 ±  0,001  ms/op
ForVsSwitch.withIf:withIf·p0.00            sample            0,001           ms/op
ForVsSwitch.withIf:withIf·p0.50            sample            0,002           ms/op
ForVsSwitch.withIf:withIf·p0.90            sample            0,002           ms/op
ForVsSwitch.withIf:withIf·p0.95            sample            0,003           ms/op
ForVsSwitch.withIf:withIf·p0.99            sample            0,005           ms/op
ForVsSwitch.withIf:withIf·p0.999           sample            0,027           ms/op
ForVsSwitch.withIf:withIf·p0.9999          sample            0,051           ms/op
ForVsSwitch.withIf:withIf·p1.00            sample            5,202           ms/op
ForVsSwitch.withSwitch                     sample  8225249   0,002 ±  0,001  ms/op
ForVsSwitch.withSwitch:withSwitch·p0.00    sample            0,001           ms/op
ForVsSwitch.withSwitch:withSwitch·p0.50    sample            0,002           ms/op
ForVsSwitch.withSwitch:withSwitch·p0.90    sample            0,002           ms/op
ForVsSwitch.withSwitch:withSwitch·p0.95    sample            0,003           ms/op
ForVsSwitch.withSwitch:withSwitch·p0.99    sample            0,018           ms/op
ForVsSwitch.withSwitch:withSwitch·p0.999   sample            0,027           ms/op
ForVsSwitch.withSwitch:withSwitch·p0.9999  sample            0,071           ms/op
ForVsSwitch.withSwitch:withSwitch·p1.00    sample           22,610           ms/op

EDIT:

I can not confirm that your statements holds. I simplified the example. I use a static list as input for both algorithms. I do not do warmup and only measure a single execution. As expected, 4-pass is more expensive than 1-pass. I really can not tell what your website is measuring.

@State(Scope.Thread)
public class ForVsSwitch {
    private char[] moves = {'U', 'D', 'L', ...};

    @Benchmark
    @BenchmarkMode(Mode.SingleShotTime)
    @OutputTimeUnit(TimeUnit.MILLISECONDS)
    @Warmup(iterations = 0)
    @Measurement(iterations = 1, batchSize = 1)
    @Fork(value = 1, warmups = 0)
    public void withSwitch() {
        judgeCircleWithSwitch();
    }

    @Benchmark
    @BenchmarkMode(Mode.SingleShotTime)
    @OutputTimeUnit(TimeUnit.MILLISECONDS)
    @Warmup(iterations = 0)
    @Measurement(iterations = 1, batchSize = 1)
    @Fork(value = 1, warmups = 0)
    public void withFor() {
        judgeCircleWithFor();
    }

    private boolean judgeCircleWithSwitch() {
        int vertical = 0;
        int horizontal = 0;

        for(int i = 0; i < moves.length; i++){
            char c = moves[i];
            switch(c){
                case 'U':
                    vertical ++;
                    break;
                case 'D':
                    vertical --;
                    break;
                case 'L':
                    horizontal --;
                    break;
                case 'R':
                    horizontal ++;
                    break;
            }
        }
        return (vertical == 0) && (horizontal == 0);
    }

    private boolean judgeCircleWithFor() {
        int x = charCount(moves, 'R') - charCount(moves, 'L');
        int y = charCount(moves, 'U') - charCount(moves, 'D');

        return x == 0 && y == 0;
    }

    private int charCount(char[] moves, char c) {
        int count = 0;
        for(int i=0; i<moves.length; i++) {
            if(moves[i] == c) {
                count++;
            }
        }
        return count;
    }
}

The for loop is more expensive than the switch. But as pointed out in other comments running it once is no reliable performance analysis.

Benchmark               Mode  Cnt  Score   Error  Units
ForVsSwitch.withFor       ss       0,577          ms/op
ForVsSwitch.withSwitch    ss       0,241          ms/op
cmoetzing
  • 742
  • 3
  • 16
  • Hmm. Gotcha. Why do you think there's such a big difference in LeetCode though? I can't figure out why the times are so different there. Any ideas? – a3y3 Apr 03 '19 at 02:36
  • 1
    Java further optimizes code after it was run a few times. That is why the above example uses warmup runs. Maybe the loop is unrolled during optimization accelerating the initially slower versions. – cmoetzing Apr 03 '19 at 05:44
  • I get that, but on Leetcode, the difference is visible right from the first submission. Thank you so much for spending so much time and writing such a detailed answer, but unfortunately I can't mark it as accepted, because it doesn't answer the question. – a3y3 Apr 03 '19 at 22:43