1

I need to work with a large array of 26-bit variables in RAM. It is too expensive to use 32-bit ints. Access should be as fast as possible (especially read operation).

I came to the following scheme: each 26-bit value splits to three 8-bit values and one 2-bit value.

#define N 500000000
uint8 arr1[N], arr2[N], arr3[N];
uint8 arr4[N / 4];

int read_value(int index)
{
  int a1 = arr1[index];                                   // bits 0..7
  int a2 = arr2[index];                                   // bits 8..15
  int a3 = arr3[index];                                   // bits 16..23
  int a4 = (arr4[index / 4] >> (2 * (index % 4))) & 3;    // bits 24..25
  return a1 | (a2 << 8) | (a3 << 16) | (a4 << 24);
}

Is there some better technique to do this? Or maybe there is a nice way to work with 27/28/29/30-bit integers?

phuclv
  • 37,963
  • 15
  • 156
  • 475

2 Answers2

1

A memory load costs much more than simple arithmetic instructions in CPU, so you shouldn't use arrays of uint8 like that. It will cost you many loads to read each element. At least use an array of uint16 as there's one less load

uint16 arr1[N];     // byte 0-15
uint8  arr2[N];     // byte 16-23
uint8  arr3[N / 4]; // byte 25-26

But this is still slow. A fast solution is reading all 13 uint32 (or uint64 if you're running a 64-bit machine) at once in a loop and then extract them to 16 26-bit ints. There are many ways to store these 26-bit ints in 13 unint32s. For example store each 26-bit ints contiguously.

A0 A1... A15

Or storing the first 32 bytes for the 16 elements' bit 0-15, the next 16 bytes for each element's bit 16-23, the last bytes will be used for bit 24-25. The memory map will be like this

B00: A₀₀[00..07]
B01: A₀₀[08..15]
B02: A₀₁[00..07]
B03: A₀₁[08..15]
...
B30: A₁₅[00..07]
B31: A₁₅[08..15]
B32: A₀₀[16..23]
B33: A₀₁[16..23]
...
B47: A₁₅[16..23]
B48: A₀₀[24..25]A₀₁[24..25]A₀₂[24..25]A₀₃[24..25]
B49: A₀₄[24..25]A₀₅[24..25]A₀₆[24..25]A₀₇[24..25]
B50: A₀₈[24..25]A₀₉[24..25]A₁₀[24..25]A₁₁[24..25]
B51: A₁₂[24..25]A₁₃[24..25]A₁₄[24..25]A₁₅[24..25]

This is commonly used in image formats with an odd number of bits per channel. For example for 10 bits per channel format then each pixel will be stored in 5 bytes, the first four store the high 8 bits of each pixel, and the low 2 bits of each pixel will be packed into the remaining byte

You should test and choose what's the fastest in your case.

phuclv
  • 37,963
  • 15
  • 156
  • 475
0

When you say it's "too expensive" to use 32-bit ints, do you mean space-wise?

Assuming that you do, I'm not really sure how to help you there. However, in terms of read speed, an array in C/C++ provides you constant-time access to the elements of the array(this is assuming that the memory is already in the CPU cache; if it isn't, it's going to take longer). Therefore, reading element 0 takes the same amount of time as reading element 10,000; the code that you have might make this slower, but I can't say that for certain.

While it looks like this code should do what you want to do, it would probably make the most sense to simply do an array of ints, even though it would take up more space. If you absolutely have to this, you could try putting inline in your method declaration so the compiler can expand it whenever you use it.

rm5248
  • 2,590
  • 3
  • 17
  • 16