3

I am doing an algorithmic contest, and I'm trying to optimize my code. Maybe what I want to do is stupid and impossible but I was wondering.

I have these requirements:

  • An inventory which can contains 4 distinct types of item. This inventory can't contain more than 10 items (all type included). Example of valid inventory: 1 / 1 / 1 / 0. Example of invalid inventories: 11 / 0 / 0 / 0 or 5 / 5 / 5 / 0
  • I have some recipe which consumes or adds items into my inventory. The recipe can't add or consume more than 10 items since the inventory can't have more than 10 items. Example of valid recipe: -1 / -2 / 3 / 0. Example of invalid recipe: -6 / -6 / +12 / 0

For now, I store the inventory and the recipe into 4 integers. Then I am able to perform some operations like:

  • ApplyRecepe: Inventory(1/1/1/0).Apply(Recepe(-1/1/0/0)) = Inventory(0/2/1/0)
  • CanAfford: Iventory(1/1/0/0).CanAfford(Recepe(-2/1/0/0)) = False

I would like to know if it is possible (and if yes, how) to store the 4 values of an inventory/recipe into one single integer and to performs previous operations on it that would be faster than comparing / adding the 4 integers as I'm doing now.

I thought of something like having the inventory like that:

int32: XXXX (number of items of the first type) - YYYY (number of items of the second type) - ZZZ (number of items of the third type) - WWW (number of item of the fourth type)

But I have two problems with that:

  1. I don't know how to handle the possible negative values
  2. It seems to me much slower than just adding the 4 integers since I have to bit shift the inventory and the recipe to get the value I want and then proceed with the addition.
ocrdu
  • 2,172
  • 6
  • 15
  • 22
Maeglix
  • 33
  • 4
  • 6
    Bitpacking like this rarely yields faster code. Why not just four separate `int8_t`s? – Botje Nov 17 '20 at 11:01
  • 1
    You're looking for _vectorization_ a.k.a. [_array programming_](https://en.wikipedia.org/wiki/Array_programming). – Andrew Cheong Nov 17 '20 at 11:04
  • This is known as [SIMD](https://en.m.wikipedia.org/wiki/SIMD), although your use case seems too simple to actually take advantage of it. – Passer By Nov 17 '20 at 11:04
  • 1
    bitpacking is mostly for storage, once you have to do computation on it, you mostly have to unpack it. – Jarod42 Nov 17 '20 at 11:04
  • FYI: I fiddled with a similar idea in [SO: Atomic operations - C](https://stackoverflow.com/a/44692164/7478597) but for a different reason: to make modification of multiple counters atomic. – Scheff's Cat Nov 17 '20 at 11:11

3 Answers3

3

Especially if you're learning, it's not a bad opportunity to try implementing your own helper class for vectorization, and consequently deepen your understanding about data in C++, even if your use case might not warrant the technique.

The insight you want to exploit is that arithmetic operations seem invariant to bitshifts, if one considers the pesky carry-bit and effects of signage (e.g. two's complement). But precisely because of these latter factors, it's much better to use some standardized underlying type like an int8_t[], as @Botje suggests.

To begin, implement the following functions. (My C++ is rusty, consider this pseudocode.)

int8_t* add(int8_t[], int8_t[], size_t);
int8_t* multiply(int8_t[], int8_t[], size_t);
int8_t* zeroes(size_t); // additive identity
int8_t* ones(size_t); // multiplicative identity

Also considering:

  • How would you like to handle overflows and underflows? Let them be and ask the developer to be cautious? Or throw exceptions?
  • Maybe you'd like to pin down the size of the array and avoid having to deal with a dynamic size_t?
  • Maybe you'd like to go as far as overloading operators?

The end result of an exercise like this, but generalized and polished, is something like Armadillo. But you'll understand it on a whole different level by doing the exercise yourself first. Also, if all this makes sense so far, you can now take a look at How to vectorize my loop with g++?—even the compiler can vectorize for you in certain cases.

Bitpacking as @Botje mentions is another step beyond this. You won't even have the safety and convenience of an integer type like int8_t or int4_t. Which additionally means the code you write might stop being platform-independent. I recommend at least finishing the vectorization exercise before delving into this.

Andrew Cheong
  • 29,362
  • 15
  • 90
  • 145
3

Storing multiple int values into one variable

Here are two alternatives:

An array. The advantage of this is that you may iterate over the elements:

int variable[] {
    1,
    1,
    1,
    0,
};

Or a class. The advantage of this is the ability to name the members:

struct {
    int X;
    int Y;
    int Z;
    int W;
} variable {
    1,
    1,
    1,
    0,
};

Then I am able to perform some operations like:

Those look like SIMD vector operations (Single Instruction Multiple Data). The array is the way to go in this case. Since the number of operands appears to be constant and small in your description, an efficient way to perform them are vector operations on the CPU 1.

There is no standard way to use SIMD operations directly in C++. To give the compiler optimal opportunity to use them, these steps need to be followed:

  • Make sure that the CPU you use supports the operations that you need. AVX-2 instruction set and its expansions have wide support for integer vector operations.
  • Make sure that you tell the compiler that the program should be optimised for that architecture.
  • Make sure to tell the compiler to perform vectorisation optimisations.
  • Make sure that the integers are sufficiently aligned as required by the operations. This can be achieved with alignas.
  • Make sure that the number of integers is known at compile time.

If the prospect of relying on the optimiser worries you, then you may instead prefer to use vector extensions that may be provided by your compiler. The use of language extensions would come at the cost of portability to other compilers naturally. Here is an example with GCC:

constexpr int count = 4;
using v4si = int __attribute__ ((vector_size (sizeof(int) * count)));

#include <iostream>

int main()
{
    v4si inventory { 1, 1, 1, 0};
    v4si recepe    {-1, 1, 0, 0};

    v4si applied = inventory + recepe;

    for (int i = 0; i < count; i++) {
        std::cout << applied[i] << ", ";
    }
}

1 If the number of operands were large, then specialised vector processor such as a GPU could be faster.

eerorika
  • 232,697
  • 12
  • 197
  • 326
2

This will be something of a non-answer, just intended to show what you're up against if you do bitpacking.

Suppose, for simplicity's sake, that recipes can only remove from inventory, and only contain positive values (you could represent negative numbers using two's complement, but it would take more bits, and add much complexity to working with the bit-packed numbers).

You then have 11 possible values for an item, so you need 4 bits for each item. Four items can then be represented in one uint16.

So, say you have an inventory with 10,4,6,9 items; this would be uint16_t inv = 0b1010'0100'0110'1001.

Then, a recipe with 2,2,2,2 items or uint16_t rec = 0b0010'0010'0010'0010.

inv - rec would give 0b1000'0010'0100'0111 for 8,2,4,7 items.

So far, so good. No need here to shift and mask to get at the individual values before doing the calculation. Yay.

Now, a recipe with 6,6,6,6 items which would be 0b0110'0110'0110'0110, giving inv - rec = 0b0011'1110'0000'0011 for 3,14,0,3 items.

Oops.

The arithmetic will work, but only if you check beforehand that the individual 4-bit results don't go out of bounds; in this example this would mean that you know beforehand that there are enough items in the inventory to fill a recipe.

You could get at, say, the third item in the inventory by doing: (inv >> 4) & 0b1111 or (inv << 8) >> 12 for doing your checks.

For testing, you would then get expressions like:

if ((inv >> 4) & 0b1111 >= (rec >> 4) & 0b1111)

or, comparing the 4 bits "in place":

if (inv & 0b0000000011110000 >= rec & 0b0000000011110000)

for each 4-bit part.

All these things are doable, but do you want to? It probably won't be faster than what is suggested in the other answers after the compiler has done its job, and it certainly won't be more readable.

It becomes even more horrible when you allow negative numbers (two's complement or otherwise) in recipes, especially if you want to bit-shift them.

So, bitpacking is nice for storage, and in some rare cases you can even do math without unpacking the bits, but I wouldn't try to go there (unless you are very performance and memory constrained).

Having said that, it could be fun to try to get it to work; there's always that.

ocrdu
  • 2,172
  • 6
  • 15
  • 22
  • 1
    "`0b 1010 0100 0110 1001 (spaces for clarity).`", C++14 allows quotes in literals: `0b1010'0100'0110'1001`. – Jarod42 Nov 17 '20 at 17:58