Why does my first solution run slower than my second solution for this Leetcode problem (Word Search)?

Question

Here is the problem statement for those unfamiliar with this problem:

Given a 2D board and a word, find if the word exists in the grid. The word can be constructed from letters of sequentially adjacent cell, where "adjacent" cells are those horizontally or vertically neighboring. The same letter cell may not be used more than once.

I opted to solve the problem in Java. In both solutions, there is a function exist(char[][] board, String word) which is required for the main query, and I use a helper function DFS() for the crux of the algorithm (standard backtracking).

The exist() function for both solutions is:

public boolean exist(char[][] board, String word) {
        if (word == null || word.length() == 0) return true;
        if (board == null || board.length == 0 || board[0] == null) return false;
        if (word.length() > board.length * board[0].length) return false;
        char[] w = word.toCharArray();
        char a = w[0];
        for (int i = 0; i < board.length; i++) {
            for (int j = 0; j < board[0].length; j++) {
                if (board[i][j] == a && DFS(board,w,i,j,0)) return true;
            }
        }
        return false;
    }

Now, here is the function DFS() for the first approach. (Note that the program assumes a word will never contain the @ character, but could be made stronger by using some random Unicode character or even a boolean array for a total guarantee):

Solution 1

public boolean DFS(char[][] board, char[] word, int i, int j, int idx) {
        if (idx == word.length) return true;
        if (i < 0 || i >= board.length || j < 0 || j >= board[0].length ||
           board[i][j] != word[idx]) return false;
        board[i][j] = '@';
        int[][] dirs = {{-1,0},{1,0},{0,-1},{0,1}};
        for (int[] d : dirs) {
            if (DFS(board,word,i+d[0],j+d[1],idx+1)) return true;
        }
        board[i][j] = word[idx];
        return false;
    }

And here is the second solution's:

Solution 2

public boolean DFS(char[][] board, char[] word, int i, int j, int idx) {
        if (idx == word.length) return true;
        if (i < 0 || i >= board.length || j < 0 || j >= board[0].length ||
           board[i][j] != word[idx]) return false;
        board[i][j] = '@';
        boolean found = false;
        found = DFS(board,word,i-1,j,idx+1) || DFS(board,word,i+1,j,idx+1) ||
            DFS(board,word,i,j-1,idx+1) || DFS(board,word,i,j+1,idx+1);
        board[i][j] = word[idx];
        return found;
    }

Now, from what I understand, with Java short-circuiting, both versions of DFS() should cease to explore additional paths should any subproblem return true. In fact, the only operational difference that I can assess exists between the two versions is that the first doesn't bother resetting the board (to its original form without @ characters indicating visited positions) if a solution is found, whereas the second does.

So from what I understand, if anything the second solution should be slower than the first. But clearly, I am mistaken: solution one runs in 8 milliseconds but solution two runs in 3 milliseconds.

Does anyone have any idea what I could be missing here?

Thanks

1.: Have you done a theoretical runtime analysis? 2.: Why are you not using spaces between arguments when calling ```DFS```? 3.: Check the amount of calls to ```DFS```. If the amount is the same, look up if there are unnecessary object creations. — akuzminykh, Feb 22 '20 at 23:00
I wonder if the `int[][] dirs = {{-1,0},{1,0},{0,-1},{0,1}};` allocation is the problem. Java should be smart enough to allocate this on the stack, but who knows. Try moving that outside the function to class scope. Let me know how it goes. Even here, though, I don't recommend putting too much weight on LC's runner time--better to write your own benchmarks. Both DFS routines should short circuit but Solution 2 is unrolled nicely, so there's no loop or `if`. — ggorlen, Feb 22 '20 at 23:02
@akuzminykh 1. did you even read the post? the algorithms are effectively equivalent. 2. because this looks better in my opinion. 3. again, the algorithms are effectively equivalent, and one wouldn't think that allocating an array of 8 ints in at most N^2 stack frames would lead to > 2x slowdown in speed. — box_of_donuts, Feb 22 '20 at 23:56
@ggorlen this was a good possibility, and i'm glad you mentioned it as this array should indeed have been declared as a class member given its continuous re-use. interestingly enough making this change only decreases runtime for solution 1 down to 6 ms from 8, so solution 2 still runs twice as fast — box_of_donuts, Feb 22 '20 at 23:58
As mentioned, soln 2 has unrolled loops and doesn't need to index into an array to get the data. I'd classify this as micro-optimization territory. If you write your own benchmark that really pushes the code, I doubt you'll see a major difference, and if you do, you can profile it to see where the problem is. Running the code for 2-6 ms doesn't really tell you much. — ggorlen, Feb 23 '20 at 00:08
@box_of_donuts In production code, spaces are always useful. Without spaces it is more difficult for humans to parse the code. Even if you were to debug this in several months, then you might see the problem. — NomadMaker, Feb 23 '20 at 00:43
@NomadMaker bro it's a leetcode problem not production code... i also genuinely don't think that adding spaces between already-short parameters to DFS calls would improve readability. Commenting on my style also feels totally unnecessary and irrelevant in the context of my question--sorry that my leetcode presentation style isn't up to your standards — box_of_donuts, Feb 23 '20 at 01:25
@ggorlen While it is micro-optimization, the question is about why there is a difference, not about how to make it faster. — jirassimok, Feb 23 '20 at 03:59
Yes, the idea is to explain the difference by looking for something substantial, not a micro-optimization. — ggorlen, Feb 23 '20 at 04:24

score 0 · Answer 1 · answered Feb 22 '20 at 23:23

0

jit compiler outsmarts you. the second version has simpler code to inline (4 versions) instead of 1 which gets decompiled and recompiled a few times. My guess.

answered Feb 22 '20 at 23:23

user2023577

1,752
1
12
23

Hmm this is an interesting point. I am not too familiar with the JIT compilation process and will look into it more. Thanks! – box_of_donuts Feb 23 '20 at 00:01
it will be a tedious process. All I could ever do is run a special hotspot plugin, in a linux vm, and get it to disassemble the jit compiled code, then "try" to detect the diffs in the huge mess of assembly code it gave. Some pointer in my question back then: https://stackoverflow.com/questions/49754232/unexplained-10-performance-boost-from-simply-adding-a-method-argument-slimmer – user2023577 Feb 23 '20 at 23:10

Why does my first solution run slower than my second solution for this Leetcode problem (Word Search)?

1 Answers1