4

I was playing with Knight Tour algorithm implementation in Java. All that time I was completely sure that implementation on C must be faster. So after reading GNU C Reference the code be done and logic was implemented the same way is on Java.

Can you imagine my wonder when the C variant took more time to process a 6x6 board.

So my question is how the code below can be optimized from technical perspective (i.e. without heuristic optimizations).

Some performance notes: on my i5 laptop with Ubuntu the provided implementation took more than 4 hours to solve 6x6 board. Program on Java can solve this task in about 3 hours 18 mins with single threaded approach.

Some algorithm notes: this implementation finds all possible tours from all cells on the board, not just closed tours. As well heuristic optimization isn't used as it helps to find faster first tour not all.

EDIT: code compiled without any optimization with this command: gcc knight_tour.c -o knight-tour

#include "stdio.h"

#define BOARD_SIZE 5
#define MAX_MOVE_COUNT BOARD_SIZE*BOARD_SIZE

void printBoard(int[][BOARD_SIZE], int);
void clearBoard(int[][BOARD_SIZE], int);
int knight_move(int[][BOARD_SIZE], int, int, int);
int is_valid_position(int, int);
void calc_all_knight_jumps();

static int ALL_KNIGHT_COL_JUMPS[BOARD_SIZE][BOARD_SIZE][9];
static int ALL_KNIGHT_ROW_JUMPS[BOARD_SIZE][BOARD_SIZE][8];

int main() {

    int board[BOARD_SIZE][BOARD_SIZE];
    clearBoard(board, BOARD_SIZE);

    calc_all_knight_jumps();

    int result[BOARD_SIZE][BOARD_SIZE];
    for (int i = 0; i < BOARD_SIZE; i++) {
        for (int j = 0; j < BOARD_SIZE; j++) {
            result[i][j] = knight_move(board, i, j, 1);
        }
    }
    printBoard(result, BOARD_SIZE);

    return 0;
}

int knight_move(int board[][BOARD_SIZE], int cpos, int rpos, int level) {
    if (level == MAX_MOVE_COUNT)
        return 1;

    board[cpos][rpos] = level;

    int solved_count = 0;
    int jump_count = ALL_KNIGHT_COL_JUMPS[cpos][rpos][8];
    for (int i = 0; i < jump_count; i++) {
        int next_cpos = ALL_KNIGHT_COL_JUMPS[cpos][rpos][i];
        int next_rpos = ALL_KNIGHT_ROW_JUMPS[cpos][rpos][i];

        if (board[next_cpos][next_rpos] == 0) {
            solved_count += knight_move(board, next_cpos, next_rpos, level + 1);
        }
    }

    board[cpos][rpos] = 0;
    return solved_count;
}

void clearBoard(int board[][BOARD_SIZE], int size) {
    for (int i = 0; i < size; i++) {
        for (int j = 0; j < size; j++) {
              board[i][j] = 0;
        }
    }
}

void printBoard(int board[][BOARD_SIZE], int size) {
    for (int i = 0; i < size; i++) {
        for (int j = 0; j < size; j++) {
            printf("%8d", board[i][j]);
        }
        printf("\n");
    }
}

int is_valid_position(int cpos, int rpos) {
    if (cpos < 0 || cpos >= BOARD_SIZE) return 0;
    if (rpos < 0 || rpos >= BOARD_SIZE) return 0;

    return 1;
}

void calc_all_knight_jumps() {
    int col_jumps[] = { 1,  2,  2,  1, -1, -2, -2, -1};
    int row_jumps[] = { 2,  1, -1, -2, -2, -1,  1,  2};

    int next_cpos, next_rpos;
    for (int i = 0; i < BOARD_SIZE; i++) {
        for (int j = 0; j < BOARD_SIZE; j++) {

            int jump_count = 0;
            for (int k = 0; k < 8; k++) {
                next_cpos = i + col_jumps[k];
                next_rpos = j + row_jumps[k];
                if (is_valid_position(next_cpos, next_rpos) == 1) {
                    ALL_KNIGHT_COL_JUMPS[i][j][jump_count] = next_cpos;
                    ALL_KNIGHT_ROW_JUMPS[i][j][jump_count] = next_rpos;
                    jump_count++;
                }
            }

            ALL_KNIGHT_COL_JUMPS[i][j][8] = jump_count;
        }
    }
}
false
  • 10,264
  • 13
  • 101
  • 209
oreh
  • 1,129
  • 1
  • 9
  • 16
  • Profile it and look for the bottleneck, and let the compiler optimize things for you, that if you try to do - will likely make your code unreadable (one example is [loop unrolling](https://en.wikipedia.org/wiki/Loop_unrolling)). – amit May 04 '18 at 18:50
  • Likely a heuristic optimization, yet code can use symmetry for a x8 improvement. `for (col=0; col*2 < N) { for (row=col; row*2 < N)` as the board has vertical, horizontal and diagonal symmetry. – chux - Reinstate Monica May 04 '18 at 18:56
  • 5
    Just compile with -O2 or -O3 and see the difference... – ead May 04 '18 at 18:58
  • I see nothing terrible except using printf(). It can be very slow, Profiling is much harder than it seems. Operating system and memory caching can make huge changes in the results. Run the same app twice and see if your results change. Java is interpreted so it's further difficult to compare. Did you include the time to load the interpreter into memory in your comparison? – Jay May 04 '18 at 19:01
  • Having done this similar problems before, I found using `ROW_N` and `COL_N` instead of `BOARD_SIZE` for both dimensions useful to explore 5x5, 5x6, 4x8, 5x7, 6x6 boards as the time to solve is easier to see with `ROW_N*COL_N` rather than `BOARD_SIZE`. 5x5 to 6x6 is a big step. – chux - Reinstate Monica May 04 '18 at 19:01
  • 2
    Note: as `knight_move()` is called 10,000,000s of times for 5x5, that is the code that truly needs attention. Others functions are of scant concern to "be optimized from technical perspective". – chux - Reinstate Monica May 04 '18 at 19:19
  • If you're comparing it with a Java implementation, then first make sure that it's an exact, or as close as possible to exact, implementation of the Java. Also, make sure that you're running your program without debugging enabled. If you're running inside a debugger, it could be stealing cycles. A C implementation *should* perform on par with a Java implementation. If the `printf()` statements are included in your timing results, all bets are off. You should disable those (in your code and in the Java code) when doing performance comparisons. – Jim Mischel May 04 '18 at 22:07
  • 2
    "code compiled without any optimization". You have just wasted four hours. – n. m. could be an AI May 05 '18 at 10:35
  • @n.m. you are right) but actually I didn't expect that such a simple and small code needs some optimization – oreh May 05 '18 at 10:48

2 Answers2

0

Taking into account all the comments I modified the source code a bit.

  • tried out -O2 and -O3 optimization options with gcc compiler;
  • reduced number of top level invocation on knight_move() method. so now only unique results are calculated and then reflected horizontally, vertically and diagonally;
  • added code to measure performance without printf() usage;
  • checked that both C and Java variants are identical as much as possible;

And finally there are results I expected - C code is faster (but with optimization options)

  • C code with the -O2 option: duration - 1348 sec (22:28)
  • Java code: duration - 1995 sec (33:15)
  • C code without optimization: duration - 3518 sec (58:38)
  • C code with the -O3 option: duration - 2143 sec (35:43)

Here are both implementations in case someone be interested in knight tour algorithm on C and Java:-)

GNU C

#include "stdio.h"
#include "time.h"

#define BOARD_SIZE 6
#define MAX_MOVE_COUNT BOARD_SIZE*BOARD_SIZE

int knight_move(int[][BOARD_SIZE], int, int, int);
void pre_calc_all_knight_jumps();
void print_result(int[][BOARD_SIZE]);

static int ALL_KNIGHT_COL_JUMPS[BOARD_SIZE][BOARD_SIZE][9];
static int ALL_KNIGHT_ROW_JUMPS[BOARD_SIZE][BOARD_SIZE][8];

int main() {
    // init board
    int board[BOARD_SIZE][BOARD_SIZE];
    for (int i = 0; i < BOARD_SIZE; i++) {
        for (int j = 0; j < BOARD_SIZE; j++) {
            board[i][j] = 0;
        }
    }

    pre_calc_all_knight_jumps();

    int result[BOARD_SIZE][BOARD_SIZE];

    struct timespec s_time, e_time;
    clock_gettime(CLOCK_MONOTONIC, &s_time);

    int border = BOARD_SIZE - 1;
    int center = BOARD_SIZE / 2.0 + 0.5;
    for (int i = 0; i < center; i++) {
        for (int j = i; j < center; j++) {
            int res = knight_move(board, i, j, 1);
            result[i][j] = res;
            result[border - i][j] = res;
            result[i][border - j] = res;
            result[border - i][border - j] = res;
            if (i != j) result[j][i] = res;
        }
    }
    clock_gettime(CLOCK_MONOTONIC, &e_time);
    printf("Duration in seconds: %ld\n", e_time.tv_sec - s_time.tv_sec);

    print_result(result);
    return 0;
}

int knight_move(int board[][BOARD_SIZE], int cpos, int rpos, int level) {
    if (level == MAX_MOVE_COUNT) return 1;

    board[cpos][rpos] = level;

    int solved_count = 0;
    int valid_move_count = ALL_KNIGHT_COL_JUMPS[cpos][rpos][8];
    for (int i = 0; i < valid_move_count; i++) {
        int next_cpos = ALL_KNIGHT_COL_JUMPS[cpos][rpos][i];
        int next_rpos = ALL_KNIGHT_ROW_JUMPS[cpos][rpos][i];

        if (board[next_cpos][next_rpos] == 0) {
            solved_count += knight_move(board, next_cpos, next_rpos, level + 1);
        }
    }

    board[cpos][rpos] = 0;
    return solved_count;
}

void print_result(int board[][BOARD_SIZE]) {
    for (int i = 0; i < BOARD_SIZE; i++) {
        for (int j = 0; j < BOARD_SIZE; j++) {
            printf("%8d", board[i][j]);
        }
        printf("\n");
    }
}

void pre_calc_all_knight_jumps() {
    int col_jumps[] = { 1,  2,  2,  1, -1, -2, -2, -1};
    int row_jumps[] = { 2,  1, -1, -2, -2, -1,  1,  2};

    int next_cpos, next_rpos;
    for (int i = 0; i < BOARD_SIZE; i++) {
        for (int j = 0; j < BOARD_SIZE; j++) {

            int jump_count = 0;
            for (int k = 0; k < 8; k++) {
                next_cpos = i + col_jumps[k];
                next_rpos = j + row_jumps[k];
                if (next_cpos < 0 || next_cpos >= BOARD_SIZE) continue;
                if (next_rpos < 0 || next_rpos >= BOARD_SIZE) continue;

                ALL_KNIGHT_COL_JUMPS[i][j][jump_count] = next_cpos;
                ALL_KNIGHT_ROW_JUMPS[i][j][jump_count] = next_rpos;
                jump_count++;
            }

            ALL_KNIGHT_COL_JUMPS[i][j][8] = jump_count;
        }
    }
}

Java

import java.util.Arrays;

public class KnightTour1 {

    private final static int BOARD_SIZE     = 6;
    private final static int MAX_MOVE_COUNT = BOARD_SIZE * BOARD_SIZE;

    private static final int[][][] ALL_KNIGHT_COL_MOVES;
    private static final int[][][] ALL_KNIGHT_ROW_MOVES;

    static {
        final int[] knightColJumps = { 1,  2,  2,  1, -1, -2, -2, -1};
        final int[] knightRowJumps = { 2,  1, -1, -2, -2, -1,  1,  2};

        ALL_KNIGHT_COL_MOVES = new int[BOARD_SIZE][BOARD_SIZE][];
        ALL_KNIGHT_ROW_MOVES = new int[BOARD_SIZE][BOARD_SIZE][];

        int[] tmpColMoves = new int[8];
        int[] tmpRowMoves = new int[8];
        for (int c = 0; c < BOARD_SIZE; c++) {
            for (int r = 0; r < BOARD_SIZE; r++) {
                int jumpCount = 0;
                for (int i = 0; i < 8; i++) {
                    int nextColPos = c + knightColJumps[i];
                    int nextRowPos = r + knightRowJumps[i];
                    if (isValidBoardPos(nextColPos, nextRowPos)) {
                        tmpColMoves[jumpCount] = nextColPos;
                        tmpRowMoves[jumpCount] = nextRowPos;
                        jumpCount++;
                    }
                }

                ALL_KNIGHT_COL_MOVES[c][r] = Arrays.copyOf(tmpColMoves, jumpCount);
                ALL_KNIGHT_ROW_MOVES[c][r] = Arrays.copyOf(tmpRowMoves, jumpCount);
            }
        }
    }

    private static boolean isValidBoardPos(int colPos, int rowPos) {
        return colPos > -1 && colPos < BOARD_SIZE && rowPos > -1 && rowPos < BOARD_SIZE;
    }

    public static void main(String[] args) {
        long sTime = System.currentTimeMillis();
        int[][] result = findNumberOfTours();
        long duration = (System.currentTimeMillis() - sTime) / 1000;

        System.out.println("Duration in seconds: " + duration);
        printResult(result);
    }

    private static int knightMove(int[][] board, int colPos, int rowPos, int level) {
        if (level == MAX_MOVE_COUNT) return 1;

        board[colPos][rowPos] = level;

        final int[] validColMoves = ALL_KNIGHT_COL_MOVES[colPos][rowPos];
        final int[] validRowMoves = ALL_KNIGHT_ROW_MOVES[colPos][rowPos];
        final int validMoveCount = validColMoves.length;

        int solvedTourCount = 0;
        for (int i = 0; i < validMoveCount; i++) {
            final int nextColPos = validColMoves[i];
            final int nextRowPos = validRowMoves[i];
            if (board[nextColPos][nextRowPos] == 0) {
                solvedTourCount += knightMove(board, nextColPos, nextRowPos, level + 1);
            }
        }

        board[colPos][rowPos] = 0;
        return solvedTourCount;
    }

    private static int[][] findNumberOfTours() {
        final int[][] result = new int[BOARD_SIZE][BOARD_SIZE];
        final int[][] board = new int[BOARD_SIZE][BOARD_SIZE];

        final int border = BOARD_SIZE - 1;
        final int center = (int)(BOARD_SIZE / 2f + 0.5);
        for (int i = 0; i < center; i++) {
            for (int j = i; j < center; j++) {
                int res = knightMove(board, i, j, 1);
                result[i][j] = res;
                result[border - i][j] = res;
                result[i][border - j] = res;
                result[border - i][border - j] = res;
                if (i != j) result[j][i] = res;
            }
        }

        return result;
    }

    private static void printResult(int[][] res) {
        for (int i = 0; i < BOARD_SIZE; i++) {
            for (int j = 0; j < BOARD_SIZE; j++) {
                System.out.print(String.format("%8d", res[i][j]));
            }
            System.out.println();
        }
    }
}
oreh
  • 1,129
  • 1
  • 9
  • 16
  • You might be able to do better, try turning on some vector options and/or extra unrolling and factor the jumpcount out in its own array. – Surt May 05 '18 at 16:20
0

Here are some suggestions about your code as well as the updated version posted in your answer:

  • use <> for standard header files:

    #include <stdio.h>
    
  • surround expressions with parentheses in macro definitions:

    #define MAX_MOVE_COUNT (BOARD_SIZE * BOARD_SIZE)
    
  • use (void) when declaring functions without arguments:

    void pre_calc_all_knight_jumps(void);
    
  • avoid mixing floating point and integer calculations with implicit conversions. Use this instead:

    int center = (BOARD_SIZE + 1) / 2;
    

Some of the symmetries are not properly reflected in the result array. You should change the main loop to:

    int border = BOARD_SIZE - 1;
    int center = (BOARD_SIZE + 1) / 2;
    for (int i = 0; i < center; i++) {
        for (int j = i; j < center; j++) {
            int res = knight_move(board, i, j, 1);
            result[i][j] = res;
            result[j][i] = res;
            result[border - i][j] = res;
            result[j][border - i] = res;
            result[i][border - j] = res;
            result[border - j][i] = res;
            result[border - i][border - j] = res;
            result[border - j][border - i] = res;
        }
    }

I also get some cache usage improvements by making the board 8x8 and using a additional argument for the playing area size.

A more efficient algorithm is definitely needed to solve this problem for larger sizes.

chqrlie
  • 131,814
  • 10
  • 121
  • 189