4

I am setting up a deep neural network to play connect-4 that has to compete against other AI bots on a very limited machine (don't know the specific limitations yet, only that I will only have a couple cores and a small amount of memory). Thus, I am looking to optimize my training set in any way possible. It currently represents the states on the boards as such:

b for blank (no piece present)

x for "x" piece present

o for "o" piece present

win for a won setup

loss for a lost setup

draw for a drawn setup

Essentially I am trying to map it so that my 3 bit integer can take the place of these memory-heavy strings. I thought about using a short, but it comes in as worse than char at 16 bits. I want to map it like such:

000 -> b

001 -> x

010 -> o

011 -> win

100 -> loss

101 -> draw

Since I can represent these states in 3 bits of space instead of chars (8 bits per char, yikes!) I would like to attempt that. However I am not sure how to instantiate a variable like this in c.

The training set is 67557 rows long representing a 6x7 board in each row with an win/loss/draw clause following. So saving 5 bits per char would save (5*6*7)+(5*1) = 215 bits per row and 215*67557 = 14524755 bits overall (1.81 MB of 2.90 MB total, a 62% decrease in overall space).

asdf
  • 2,927
  • 2
  • 21
  • 42
  • 2
    By the way, connect-4 is a solved game: the first player can *always* win. Implement that and you'll trash the competition! – Bathsheba Oct 01 '15 at 09:04
  • 2
    The draw, win and loss states, shouldn't they be unconnected with the piece placements? It just seems weird that a piece can be empty, a circle, a cross or a *win*? If you disconnect those two then you can store the piece-state in only two bits which fits better within a byte (four pieces per byte). Then have the draw/win/loss state be connected to a *board* instead of separate pieces. – Some programmer dude Oct 01 '15 at 09:09
  • @JoachimPileborg That is something I thought of, however, I am optimizing a current dataset with rows like `x,b,b,b...o,o,x,b,b,win` and figured the best place to start would be to iterate over that and replace the current setup with an optimized one. – asdf Oct 01 '15 at 09:10
  • 2
    You need many pieces, but only one win/loose/draw indicator, right? Besides being more space-optimized to separate the states, the code will also be *much* more simpler when you don' t have to worry about three bits crossing a byte or word boundary. just saying... :) – Some programmer dude Oct 01 '15 at 09:19
  • 1
    Like Joachim said, it's more effiecient to use 2 bits only for map coordinate. With that you need 6*7*2 = 84 bits for map. If you fit that in 6 16-bit variables, you still have 12 bits left for result (win/loss/draw). Or in 11 8-bit variables, with 4 bits for result. – user694733 Oct 01 '15 at 09:29

3 Answers3

7

And what about if you use bitfields?

struct S {
    unsigned w : 3;
    unsigned x : 3;
    unsigned y : 3;
    unsigned z : 3;
};

On most systems, sizeof(struct S) == 2, but you'ld hold 4 values in it!.

Now, you can do something like...

struct S myS;
s.w = 0; // Okay
s.x = 6; // Okay
s.y = 8; // Will overflow/wrap-around (why do the standards differentiate between those two?)

But, if you do something like...

struct S first;
struct S second;

You'll lose the memory efficiency you're seeking for, because the compiler has to provide an address for both objects, thus they need to be byte-aligned, so they together (usually) hold 32 bits, in which you can hold eight values, but, if you had a single structure holding all eight variables, the memory usage would typically be 24 bits.

Have in mind that a structure that holds bitfields that use all the available space (such as the 8-member one mentioned above: 8 * 3 == 24; 24 % 8 == 0) are better suited for your purposes, as you can have arrays of them, gain the benefits of bitfields, and waste no memory in the process. The first structure is thus inefficient, because it wastes 4 bits for each object of type S.

BTW: Don't try to use &s.x or sizeof(s.x), it simply won't work for obvious reasons.

3442
  • 8,248
  • 2
  • 19
  • 41
  • 1
    Be warned. this might (depending on compiler) result in unexpected extensions, with the result that s.y = 6; if (s.y == 6) will fail because s.y is now a small negative number – Tom Tanner Oct 01 '15 at 09:49
  • also I'd sincerely hope that sizeof(struct S) would be more like 2 on most systems. – Tom Tanner Oct 01 '15 at 10:51
  • @TomTanner: Sorry, that `sizeof(struct S) == 16` was a typo. – 3442 Oct 01 '15 at 11:29
  • 2
    @TomTanner, this can perhaps be avoided by using `unsigned` for the bit fields. – Jens Gustedt Oct 01 '15 at 12:44
  • @JensGustedt probably. it's one of those 'you really need to read the compiler docs and then try it and see' situations. – Tom Tanner Oct 01 '15 at 12:51
  • 1
    No, I hope that this part is not implementation dependent. If a bit-field is `unsigned` it should have all values positive and implicit conversion to `int` can never change that to negative. – Jens Gustedt Oct 01 '15 at 15:22
  • Concerning `s.y = 8;` Field `y` may be `unsigned` and no `unsigned` overflow on assignment as simply the value will wrap. see http://stackoverflow.com/a/23987436/2410359 – chux - Reinstate Monica Oct 02 '15 at 15:01
  • @chux: True, but, as written in my answer, it's `signed`, so it overflows. Anyway, I don't see the point in the standard for wrapping-around not being a *defined* class of overflowing. And, BTW, I did already had enough with `signed`/`unsigned` discussion stuff on this comments, so I'll better off change the answer to deal with `unsigned`s instead. – 3442 Oct 02 '15 at 15:35
3

You have two or three different things here that you're conflating.

  • training file format
  • in-memory storage format for the training set after parsing (if you need to keep parsed states around for future reference)
  • unpacked representation of a single board state + optional? W/L/D flag

All three of these formats can be different. The training file can be text for easy editing. Of course, even if your main program reads the training set in a binary format, that binary could be "compiled" from an easily-edited text format by a separate tool.

internal representation for working with a single board position:

This needs to be fast to access and loop over. Since you're training a neural net, not writing an AI directly, you may not need to work with this representation very much. If you don't need to do more than apply each element to a neural net input, there's no point having a separate format: just unpack directly from a more compact representation to the neural net inputs.

However, if you have to loop over a single board state multiple times, you have some interesting options. As multiple people have pointed out, the win/loss/draw/undecided flag should be thought of separately from the board state. So you'd have one flag per board, not storage for the flag in every board position.

  • bit-board: I've read about chess engines (e.g. crafty) using a 64bit unsigned int to store where all the white pawns are, for example. You can bitwise OR bitmaps together to find where all the white pieces of any kind are.

    Bitmaps (one for o, one for x) will record the entire state. A connect-4 board has 6*7 grid locations, so each bitmap can be 64bits, but 32b is too small. popcount(board.o) tells you how many os are on the board. assert(o & x == 0) will be a good sanity-check, because there can never be an o and an x in the same position.

    Using two packed 42b fields in a struct would be a bad idea, since it would be slow to load/store. Even packing them into 48bit fields (so they end on byte boundaries) would lead to slower loading/storing. Remember, this is our fast format. We can use a packed format for long-term storage.

    Something like board[0][0] && board[0][1] && board[0][2] && board[0][3] (although not with that syntax) with compile-time-constant positions is very fast on a bit-board. One bitwise AND leaves just those bits possibly-set, and then you can compare against the mask to see if all the bits were set. To test || instead of &&, leave out the second step. You can do these tests against the o or x bitmaps, or o|x to check for either kind of piece. However, this isn't as efficient if you have to build the mask at run-time from variable positions.

    To scan a board for wins, you might check the left column, then bit-shift your mask so it's checking the next column. Actually brute-force checking all columns like this is probably slower than checking the neighbours of markers, looking for 2-in-a-row candidates.

    Some operations might be easier if the bitmap is fully 64bit, representing an 8x8 board, but you only actually use the lower-left 7x6 of it. That way, separate columns are in separate bytes of the 64bit integer. Having each column in a separate byte is probably more useful than row, because finding the highest used position in a column is a thing you might want to do. It's just a find first set bit operation on the column. Extracting 8bit chunks from a bitmap is faster (doesn't require masking). You could unpack a 42bit bitmap to separate variables for each column, though. On x86, where the first 4 registers are byte addressable for the first and second 8 bit chunks (AX (the low16 of RAX) is made up of AL and AH), you (or the compiler, but they're probably not this clever) can store 7 columns in 4 registers, and still be able to bsr (bit-scan-reverse) any column separately.

// Sample bitboard implementation:
struct game_state {
    struct board_state {
       uint64_t o, x;
    } board;
    enum winlose { GAME_UNDECIDED=0, GAME_WIN_O, GAME_WIN_X, GAME_DRAW } victory;
};
  • array of 2-bit fields: don't use. A similar implementation would use a 2-bit bitfield for every position. It's impossible to use with nice board[row][col] syntax in C, and 42*2 bits doesn't fit in a single register. Interleaving the bitboards gives no advantage, and makes some things a lot worse, esp. because the whole thing doesn't fit in 64bits. (If you want to look for unoccupied spaces in the bitboard version, you look for zero bits in o|x. Here, you have to check every pair of 2 bits, instead of being able to use one bitwise to fit the whole problem into a register. Still, you could make a macro to shift/mask the 2 bits representing a given row/column. It wouldn't make efficient code.

  • array of bytes: It may be faster to use this format for looping around checking the neighbours of a given board position. In a bitboard, testing board[i][j] && board[i][j+1] could be done by bit-shifting the board so the two bits of interest line, then bitwise AND, then bit-test that bit. At least on x86, there are addressing modes with small byte offsets, so given the address of one board position, ANDing another board position might only take a single instruction.

    In each byte, one bit represents x, another represents o, and another bit should be set if the location is either x or o. This allows checking that multiple positions are all occupied by ANDing them together, and checking the occupied bit. Otherwise, you'd have to check if either the x or o bit was set in each grid.

// Sample byte-array implementation:
enum boardpos {
    POS_EMPTY = 0,
    POS_O = 1<<0,
    POS_X = 1<<1,
    POS_OCCUPIED = 1<<3
};
// maybe #define for these constants instead?

struct game_state {
    struct board_state {
       uint8_t pos[6][7];
    } board;
    enum winlose { GAME_UNDECIDED=0, GAME_WIN_O, GAME_WIN_X, GAME_DRAW } victory;
    // or maybe stuff the winlose info into the high bits of board.pos[0][0]?
    // Not much point, since the struct will probably be the same size after padding anyway.
};

File-format representation:

A more compact but still easy to use format would be xbbb...ooxbbw. Then you don't have to parse lines as strings, just as a constant-size chunk of 43 characters (or 43 if each record is separated by a newline). If you have any board positions which aren't wins, losses, or draws, use another character to mark that. Space, or 'n'.

Just leaving out the commas cuts your size nearly in half. You don't want to have to parse the input on commas and stuff. There are potential further gains from a simple run-length encoding notation, like xb2o1xb1w. Seeing a digit means repeat the last character that many more times. And maybe x means one x, capital X means two xes. That's getting to the point that it's hard for a human to read. LZOP or LZ4 compression could probably do a good job of compacting things.

Binary file formats:

Obviously there's a limit to text representations. Fixed-size binary records can be very small, because there isn't much information to be stored. Using 2 bits per grid position is probably compact enough, but there's still redundancy because that can represent the impossible state of x and o in the same position. To do better, you'd need to map the state of the whole board, or a whole row or column, to a multi-bit representation. Wikipedia says there are 4,531,985,219,092 legal positions for all game boards populated with 0 to 42 pieces. That's just over 2^42. So 43 bits should be enough to represent any valid board state, including all not-yet-decided positions. IDK how to go about encoding games into 43-bit integers, at least not anything usable (i.e. faster than actually enumerating all possible games and stopping at the one that matches.)

If you use bit-boards as your internal fast representation, store them packed in your file format, so the o and x boards plus the w/d/d status fits in 12 bytes, or 16bytes if you like round numbers.

// do some pre-processor stuff to choose between GNU C __attribute__ ((__packed__))
// and the MSVC #pragma pack
struct __attribute__ ((__packed__)) compact_game_state {
    struct __attribute__ ((__packed__)) compact_board_state {
       uint64_t o:42, x:42;
    } board; // sizeof = 11
    uint8_t victory;
}; // sizeof = 12

struct semi_compact_game_state {
    struct __attribute__ ((__packed__)) semi_compact_board_state {
       uint64_t o:48, x:48;
    } board; // 96 bits = 12 bytes
    enum winlose victory; // another 4 bytes
};

These actually compile with g++: see on godbolt.

Do your I/O with endian-agnostic code, so it doesn't break on big-endian machines. It's a file format, so it doesn't actually matter how you access it, as long as the right bytes go in the right place. Little-endian is probably a good choice for the file format, so on little-endian machine the load/store code is a no-op. Or just be lazy and do binary I/O on an array of structs, and only use your code on machines with the same endianness as the training data set.

If you don't use bit-boards, it's probably best to use an array of 2-bit fields. Random-access can be slow, but converting this to a byte-array will probably be faster than for two separate bitfields. Mask off the low 2 bits, use that as an index into a lookup table of { POS_EMPTY, POS_O|POS_OCCUPIED, POS_X|POS_OCCUPIED }. Then bit-shift by two to bring the next field into the low position. The board takes 84 bits, so do it in separate 32 or 64bit chunks. Doing a 128bit double-shift is not needed. The win/lose/draw info can go in a final 2-bit chunk.

Pack this into a 12 byte struct of three uint32_t, or a uint64_t and a uint32_t or something. Or just an array of uint8_t, but then it's harder to get the compiler to do one wide load. You could pack things into an 11 byte struct, but then alignment is more of a problem. If saving 1/12th memory size makes a useful difference to caching, go for it. A load that crosses a cache line only costs a couple extra cycles on x86 CPUs, and you're not loading very often. (A 64bit load of the first chunk won't be aligned to 64b anyway, with 12B structs, but it will at least split right down the middle, which is a special case that's faster than an uneven cache-line split on some CPUs.)

Decoding separate bitboards into a byte array would require shifting each bitboard separately, I think. It could still be branchless, but not as nice.

in-memory storage for the database of games

Converting between representations takes CPU time, so just go from file format to internal if it's not useful.

It's probably only useful to have a separate format for this if you keep a text file format, and use a non-compact fast format. In that case, parsing the file into packed 2-bit positions (like for that file format, see above).

If your file format is binary, just keep that around. (Or even just memory-map the file.)

Community
  • 1
  • 1
Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • @asdf: I had another interesting idea: column-major storage for a bit-board means you can use bit-scan functions (that turn into a single instruction) instead of loops, to find the highest occupied position in a column. I know you said you're working with neural nets, so I don't know if efficient bit-board search functionality is of any use. – Peter Cordes Oct 02 '15 at 18:14
2

If the machine is limited and you must compete (on time), then you DON'T want to go into CPU-heavy operations on bitfield-wide integers. The best is to use the native machine word size. Next best is to use byte sized integers as many machines have eficient operations on byte-sized entities.

It is always a matter of optimizing for speed or for memory use.

Paul Ogilvie
  • 25,048
  • 4
  • 23
  • 41
  • 2
    This is not true. Did you noticed the measurements (in *megabytes* for an *embedded* system) that the OP gave? This is simply not feasible for him. – 3442 Oct 01 '15 at 09:58
  • 1
    Almost all CPUs have efficient-enough byte operations (or at least the ability to load-with-zero-extend a byte to a full-width register) that it doesn't make sense to suggest using an array of 32bit integers to represent a board. An array of 8bit integers is a good option for working with a game state, but that doesn't make it a good file format or in-memory format for storing multiple games, because @Kemy has a good point. See my answer. BTW, chess engines like crafty do actually make heavy use of bit-boards that fit in a 64bit register. Bitwise tests can check a whole diagonal, or w/e. – Peter Cordes Oct 02 '15 at 12:23