Your choice of DEF
as the bottom 3 bits is wise. You might want to use them to index a table, so only needing to mask, not shift-and-mask, is good.
If you ever want to index a table with the BC
bits, putting them as the upper 2 bits would mean you only need to shift, and not mask.
Or, if you often want to test the A
bit, putting it as the MSB could save a tiny amount of code-size. test al,al
/js
(branch on the sign bit) is slightly shorter than test al, 0x20
/ jnz
(branch on bit 5), because of the immediate byte. The only speed difference is in the x86 code size, though, so it's very minor.
What is the efficient way to count set bits at a position or lower? took advantage of a similar trick, where the compiler can use an arithmetic right-shift to broadcast the sign bit to the whole register. This can then be used as an AND-mask.
Bitwise operations are cheap. If you often need both pieces of information (piece type and color), you'll probably be better off with packing your maps together, rather than having two separate arrays. Four 16B vector registers can hold a whole board. Loading from memory that's in L1 cache is extremely cheap on x86, maybe even cheaper than a couple ALU ops in registers if there's a lot of parallelism and your code is already bottlenecked on 3 or 4 ALU operations per clock. (Intel SnB / Haswell). So it might possibly be a win to store it separately, but only if everything really is hot in L1. Otherwise, go for the data density every time.
Many chess programs store board positions as a Bitboard, where each position of a 64bit int represents a board position. So you might have a mask of "black pawns" and "white pawns". You can OR them together to get a bitboard of multiple pieces. You can AND them to see where they intersect.
I think you can use clever bitwise stuff to check the diagonals the bishops can attack, and things like that. 32 and 64bit immediate constants are cheap.