4

I'm working on writing a simple flash file system, and I need to store one of three states for each page in the flash device:

FREE
INVALID (ready to be freed)
VALID

If it were only two possible states, I'd use a bitmap for sure (memory is a concern). But what's the most compact way of holding these values in this case?

The only thing I can think of are packing four 2-bit values into a char and using bitmasks to operate on each value.

For example (wrote this quickly, so there's no guarantee it works perfectly):

#define FREE       0x0 // 0b00
#define INVALID    0x1 // 0b01
#define VALID      0x2 // 0b10

char state[NUM_ITEMS/4];

void set_state(int item_num, int state) {
    int idx;
    char tmp;

    idx = item_num / 4;
    tmp = state[idx];

    tmp &= ~(0x3 << (item_num % 4));
    tmp |= (state << (item_num % 4));

    state[idx] = tmp;
}

int main(void) {
    //...
    set_state(6, INVALID);
    //...
    return 0;
}

Is there another option I'm unaware of?

tonysdg
  • 1,335
  • 11
  • 32
  • Where is your code? Smallest type size is a byte which is char type. – Nguai al May 03 '17 at 20:34
  • 1
    The absolute most compact way is sorting the entire collection by this state and keeping a pointer to the beginning of each section. You could also treat the sequence of states as a large base-3 integer and convert it to base-256 for storage. Both of those are impractical for a lot of cases, though – what’s yours? There are lots of halfways to the first one, too: compress the sequence of 2-bit values, just heapify it, … and halfways to the second: convert to base-3 for chunks of n items, …. – Ry- May 03 '17 at 20:36
  • Data can be compressed in various forms with trade-off of access time. " large number of items:" is vague, 1, 1M 1T, what? – chux - Reinstate Monica May 03 '17 at 20:38
  • @Nguaial: I added some quick sample code; I'm not 100% sure if it'll work perfectly, but I think it gives an idea of what I'm asking about. – tonysdg May 03 '17 at 20:46
  • @Ryan: I've updated the question with more information to answer my use case. – tonysdg May 03 '17 at 20:47
  • @chux: I've updated the question with more information to answer my use case. – tonysdg May 03 '17 at 20:47
  • 1
    @tonysdg With a base 3 implementation (as with 3+ answers) , the space needed will be 3/4 of what you have posted with modest access times. If the 3 states occur in consistent proportions like max 10%, max 50%, max 50%, then hamming codes can be used. Data can be compressed, even typical binary data is reduced 50% - yet that takes time. So unless you have more time than memory, the base 3 solution are your best shot. Else, you need to supply more info describing the use case. Further, what is wrong/weak with your solution? Looks sufficient. – chux - Reinstate Monica May 03 '17 at 20:58
  • I'd use four states anyway: free/clear, valid/in use, invalid/garbage, and damaged/reserved for other use. – Nominal Animal May 03 '17 at 23:48

3 Answers3

4

Base 3

0, 1, 2
10, 11, 12, /* 3, 4, 5 in base 10 */
20, 21, 22 /* 6, 7, 8 */
100, 101, 102 /* 9, 10, 11 */
...
pmg
  • 106,608
  • 13
  • 126
  • 198
2

If performance is not a concern, you could fit five ternary values per 8-bit char. the first ternary value is stored as x, the second is stored as x*3, the third is stored as x*3*3, the fourth as x*3*3*3... etc.

To get the zeroth ternary value from the char, which let's call y, you'd take y % 3. To get the second ternary value, take (y/3) % 3, (y/(3*3)) % 3 for the third... etc.

This will result in much more divisions than if you settled for storing 4 ternary values per char, but you can squeeze in an extra value through this method.

Jeffrey Cash
  • 1,023
  • 6
  • 12
2

(Almost) the best you can do without compressing is to pack 5 ternary values (trits) in one byte. Enumerate every possible combination of five consecutive trits:

FFFFF (0)
FFFFI (1)
FFFFV (2)
FFFIF (4) 
...
VVVVV (242)

There are 243 such combinations, therefore, an 8-bit byte (2**8=256) is sufficient to store any of them. Likewise, you can store 10 trits in 2 bytes, 15 trits in 3 bytes, etc.

DYZ
  • 55,249
  • 10
  • 64
  • 93
  • I see, that's really smart, good... really nice. I'm not sure if in every architecture 1 byte is really one byte, I read in this quora art (https://www.quora.com/Why-does-a-character-in-C-need-only-1-byte) that it's depends on some factors. – developer_hatch May 04 '17 at 05:23
  • 1
    All POSIX systems have 8-bit bytes. Read more here http://stackoverflow.com/questions/2098149/what-platforms-have-something-other-than-8-bit-char – DYZ May 04 '17 at 05:29