0

I have 3 unsigned ints with range [0, 255]. I want to store these 3 numbers to a compact storage and since this operation happens too often I want to know how I can improve it.

Initially I tried this:

struct Foo {
    uint8_t x;
    uint8_t y;
    uint8_t z;
};

Foo arr[] = ...;

void pushBack(unsigned x, unsigned y, unsigned z) {
    arr[count].x = x;
    arr[count].y = y;
    arr[count++].z = z;
}

The results were not good so I tried this instead:

struct Foo {
    union {
        struct {
            uint8_t x;
            uint8_t y;
            uint8_t z;
            uint8_t pad_;
        } v;
        uint32_t u32;
    };
};

Foo arr[] = ...;

void pushBack(unsigned x, unsigned y, unsigned z) {
    arr[count++].u32 = (z << 16) | (y << 8) | x;
}

And the results improved quite a bit.

I wonder if there is a way to improve this even more with SSE and/or AVX instructions.

  • if you change xyz to uin8_t there's a high chance your code will run faster immediately – user3528438 Nov 07 '16 at 15:39
  • why x, y, z, pad_ and additionally the u32? – Bodo Thiesen Nov 07 '16 at 15:40
  • and also in your second set of code you seem want to use a union but instead use a struct. but writing to a member in a union then trying to read it out from it through another member is not a good idea. – user3528438 Nov 07 '16 at 15:41
  • @user3528438 Changing the pushBack parameters to uint8_t didn't make any difference. Thanks for letting me know, I've added the union in the second example. – Pan. Christopoulos Charitos Nov 07 '16 at 15:46
  • 3
    It is possible to use SSE there but probably useless, you could do it with two `pinsrb`'s and two `movd`'s but that's worse than two shifts and two ors. If you're doing a million of these in such a way that you do several *at the same time* then it becomes more interesting. As usual SSE requires more of a big picture look to use it well. – harold Nov 07 '16 at 16:38
  • @user3528438: This is a C question, not C++. Type-punning with unions is [explicitly allowed since C99](http://stackoverflow.com/questions/11639947/is-type-punning-through-a-union-unspecified-in-c99-and-has-it-become-specified), and is a well-known technique. Using it to work around compilers that don't combine multiple narrow stores is not a bad idea. – Peter Cordes Nov 07 '16 at 21:48
  • Making `z` a uint8_t is not a good idea. It won't help the compiler's asm output, and it makes it confusing for humans: I'd worry that `z << 16` would overflow a `uint8_t`. It's actually safe because C integer promotion rules say that narrow types are promoted up to `int` (and that would happen even if the shift count also had `uint8_t` type, instead of `int` from being a bare numeric constant). Thecompiler would have to make sure it zeroed out the upper bits of the 32 / 64-bit register holding the uint8_t, instead of just ORing them in, so if anything it's slower. – Peter Cordes Nov 07 '16 at 21:51
  • What are you trying to do?... just initialise arr[].X/Y/Z to X, Y and Z? If the union was a dimension 3 vector, then X,Y,Z would map into 0,1,2. One could imagine that a 10 set would fit into 30 Bytes (240 bits and use most of SIMD (256) without a bunch of peel and remainder drama. Clearly 96 has factors of both 3 and 32, which would indicate that a structure of 32 sets on X/Y/Z would be 3 sets of AXV256. – Holmz Nov 08 '16 at 15:16

0 Answers0