11

This is about ANSI-C (C90). This is what I know:

  • I can directly tell the compiler how many bits I want for a specific variable.
  • If I want 1 bit which can have the values zero or one.
  • or 2 bits for the values 0,1,2,3, and so on...;

I'm familiar with the syntax.

I have problem concerning bitfields:

  • I want to define a SET structure.
  • It can have maximum 1024 elements (it can have less, but the maximum is 1024 elements).
  • The domain of the set is from 1 to 1024. So an element could have any value 1-1024.

I'm trying to create a structure for a SET, and it must be efficient as possible for the memory part.

I tried:

typedef struct set
{
    unsigned int var: 1;
} SET;
//now define an array of SETS
SET array_of_sets[MAX_SIZE]  //didn't define MAX_SIZE, but no more than 1024 elements in each set.

I know this isn't efficient; maybe it's even not good for what I want. That's why I'm looking for help.

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
idan di
  • 241
  • 2
  • 7
  • 7
    That will probably use 32 bits of storage per algorithmic bit, the antithesis of the space efficiency you are after. At minimum, it will use 8 bits of storage per algorithmic bit (unless you have an oddball machine, in which case it is likely to use rather than less). You need to think in terms of bit masking and a suitably sized array of unsigned integers of some sub-species or other (`unsigned long long` or `uint64_t` or such like). Bit fields occasionally have their uses. This categorically isn't one of them. – Jonathan Leffler Jun 28 '15 at 06:49
  • 2
    If your concern is memory efficiency only, I think you'd be better off writing your own functions to get/set the 1-bit elements of the array, for example packing 8 of them at a time in one char array element. Probably using a 64-bit ibt would be even better cache-wise – Pynchia Jun 28 '15 at 06:52
  • @Pynchia The suggestion to implement masking yourself in a language which does have support for bitfields sounds odd. Is it because C bitfields only are useful for packing the fields in a struct and not the entries in an array? And if so, then why is it so? – kasperd Jun 28 '15 at 06:57
  • 3
    @kasperd yes you got my point. How many bytes does the compiler allocate for eight array elements? If it's more than one then my suggestions holds – Pynchia Jun 28 '15 at 07:02
  • 2
    @kasperd because 1. bits can't be addressed in C, the smallest addressable unit of storage is a byte, and 2. array entries need to be aligned individually. – The Paramagnetic Croissant Jun 28 '15 at 07:04
  • @TheParamagneticCroissant The same applies to fields in a struct. But using bitfields gets around that requirement on a struct. The question is, is there any way to use a bitfield to achieve the same for an array. – kasperd Jun 28 '15 at 07:19
  • 1
    @kasperd no, because bitfields are *inside* a struct. Yes, bitfields are a work-around (a hack, if you like) of which the scope only extends to structs. This means that you can't use them to prevent the compiler from aligning the struct themselves. – The Paramagnetic Croissant Jun 28 '15 at 07:25
  • 3
    @kasperd: No, bit fields don't magically make bits addressable at the hardware level. Bit fields work around the fact that bits are not addressable by having the compiler generate bit-wise masking operations to simulate bit-addressing access. There is no way to achieve an array of bit fields. There is no point in trying to use a bit field for saving space like this. None whatsoever. It actually costs space in memory for storage, and it costs more space in memory for code to access the bits. Not a good trade off. – Jonathan Leffler Jun 28 '15 at 07:25
  • @JonathanLeffler Of course there is a point in trying. It is possible to save memory by using only the needed number of bits for each stored value. The question is how to express that in C code. It is clear that the code in the question doesn't express what the OP wants. But if the OP knew how to express it, he probably wouldn't have written the question in the first place. – kasperd Jun 28 '15 at 07:33
  • 1
    @kasperd: Show me how to create a portable piece of code that is moderately efficient (no more than 8 almost identical fragments of code for each access to a bit in your array structure). I'm sorry, but it is not doable sanely with bit fields. It is doable sanely with bit-wise logical operations (masking, shifting, etc). – Jonathan Leffler Jun 28 '15 at 07:37
  • @JonathanLeffler If I knew how to express it in C, I would have written an answer already. I think it is a very good question, I just don't know the answer. – kasperd Jun 28 '15 at 07:39
  • You probably dont want to do such tricks. Memory is cheap. Cache efficiency matters a lot. – Basile Starynkevitch Jun 28 '15 at 08:06
  • why using bitmaps? a `unsigned short int` would the best candidate here. You don't even need a struct. you can just an array. The max memory it will take is 2014 bytes. This solution is way easier and less error-prone. – Giuseppe Pes Jun 28 '15 at 08:25
  • Given that the OP wants to store a set of elements where each element has a value from 1 to 1024; why are people ignoring the question and talking about getting/setting individual bits instead? – Brendan Jun 28 '15 at 08:55
  • @Brendan: Because the question discusses bit fields, and the example code in the question uses `typedef struct set { unsigned int var: 1; } SET;` which contains a single bit of usable data in a space that is of implementation-defined size that is likely to be 32 bits of storage and won't be less than 8 bits of storage. – Jonathan Leffler Jun 28 '15 at 08:57
  • @JonathanLeffler: It's far more likely that the "1 bit bitfield" in the example code is just plain wrong and not what the OP wants (and should be ignored); and that the "set of elements with values from 1 to 1024" is what the OP wants. – Brendan Jun 28 '15 at 09:06
  • @Brendan: yes, and the most efficient way to store those bits is in an array of some unsigned integer type with each bit of each integer in the array representing one of the 1024 values. As, for example, in the code in my answer. – Jonathan Leffler Jun 28 '15 at 09:09
  • I just want to mention `FD_SET` for an example of a standard unix "neatly packed" construct (that's not seeing much modern use, hopefully). – Benjamin Gruenbaum Jun 28 '15 at 13:47

6 Answers6

6

As noted in extensive comments, using a bit field is not the way to go. You can use just 128 bytes of storage for your set containing values 1..1024. You will need to map the value N to bit N-1 (so you have bits 0..1023 to work with). You also need to decide on the operations you need for your set. This code supports 'create', 'destroy', 'insert', 'delete' and 'in_set'. It does not support iteration over the elements in the set; that can be added if you want it.

sets.h

#ifndef SETS_H_INCLUDED
#define SETS_H_INCLUDED

typedef struct Set Set;
enum { MAX_ELEMENTS = 1024 };

extern Set *create(void);
extern void destroy(Set *set);
extern void insert(Set *set, int value);
extern void delete(Set *set, int value);
extern int in_set(Set *set, int value);

#endif /* SETS_H_INCLUDED */

sets.c

#include "sets.h"
#include <assert.h>
#include <limits.h>
#include <stdlib.h>
#include <string.h>

typedef unsigned long Bits;
#define BITS_C(n)  ((Bits)(n))
enum { ARRAY_SIZE = MAX_ELEMENTS / (sizeof(Bits) * CHAR_BIT) };

struct Set
{
     Bits set[ARRAY_SIZE];
};

Set *create(void)
{
    Set *set = malloc(sizeof(*set));
    if (set != 0)
        memset(set, 0, sizeof(*set));
    return set;
}

void destroy(Set *set)
{
    free(set);
}

void insert(Set *set, int value)
{
    assert(value >= 1 && value <= MAX_ELEMENTS);
    value--;  /* 0..1023 */
    int index = value / (sizeof(Bits) * CHAR_BIT);
    int bitnum = value % (sizeof(Bits) * CHAR_BIT);
    Bits mask = BITS_C(1) << bitnum;
    /* printf("I: %d (%d:%d:0x%.2lX)\n", value+1, index, bitnum, mask); */
    set->set[index] |= mask;
}

void delete(Set *set, int value)
{
    assert(value >= 1 && value <= MAX_ELEMENTS);
    value--;  /* 0..1023 */
    int index = value / (sizeof(Bits) * CHAR_BIT);
    int bitnum = value % (sizeof(Bits) * CHAR_BIT);
    Bits mask = BITS_C(1) << bitnum;
    /* printf("D: %d (%d:%d:0x%.2lX)\n", value+1, index, bitnum, mask); */
    set->set[index] &= ~mask;
}

/* C90 does not support <stdbool.h> */
int in_set(Set *set, int value)
{
    assert(value >= 1 && value <= MAX_ELEMENTS);
    value--;  /* 0..1023 */
    int index = value / (sizeof(Bits) * CHAR_BIT);
    int bitnum = value % (sizeof(Bits) * CHAR_BIT);
    Bits mask = BITS_C(1) << bitnum;
    /* printf("T: %d (%d:%d:0x%.2lX) = %d\n", value+1, index, bitnum, mask,
              (set->set[index] & mask) != 0); */
    return (set->set[index] & mask) != 0;
}

#include <stdio.h>

enum { NUMBERS_PER_LINE = 15 };

int main(void)
{
    Set *set = create();
    if (set != 0)
    {
        int i;
        int n = 0;
        for (i = 1; i <= MAX_ELEMENTS; i += 4)
             insert(set, i);
        for (i = 3; i <= MAX_ELEMENTS; i += 6)
             delete(set, i);

        for (i = 1; i <= MAX_ELEMENTS; i++)
        {
             if (in_set(set, i))
             {
                 printf(" %4d", i);
                 if (++n % NUMBERS_PER_LINE == 0)
                 {
                     putchar('\n');
                     n = 0;
                 }
             }
        }
        if (n % NUMBERS_PER_LINE != 0)
            putchar('\n');
        destroy(set);
    }
    return 0;
}

The functions should really be given a systematic prefix, such as set_. The BITS_C macro is based on the INT64_C macro (and the other related macros) defined in <stdint.h> in C99 and later, which is also not a part of C90.

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
2

As per my previous comments, here is an example of how you can pack eight 1-bit elements into one char physical element. I have only implemented the function to get the value of a 1-bit element, I leave the function to set it to you (it's easy to do).

Note: you can easily change the type of the array element (unsigned char) and experiment with types which can hold more bits (e.g unsigned int) and test if they perform better in terms of speed. You can also modify the code to make it handle elements bigger than one bit.

#include <stdio.h>
#include <limits.h>

unsigned int get_el(unsigned char* array, unsigned int index)
{
    unsigned int bits_per_arr_el = sizeof(unsigned char)*CHAR_BIT;
    unsigned int arr_index = index / bits_per_arr_el;
    unsigned int bit_offset = index % bits_per_arr_el;
    unsigned int bitmask = 1 << bit_offset;
    unsigned int retval;

    // printf("index=%u\n", index);
    // printf("bits_per_arr_el=%u\n", bits_per_arr_el);
    // printf("arr_index=%u\n", arr_index);
    // printf("bit_offset=%u\n", bit_offset);

    retval = array[arr_index] & bitmask ? 1 : 0; // can be simpler if only True/False is needed
    return(retval);
}

#define MAX_SIZE 10
unsigned char bitarray[MAX_SIZE];

int main()
{
    bitarray[1] = 3; // 00000011
    printf("array[7]=%u, array[8]=%u, array[9]=%u, array[10]=%u\n",
            get_el(bitarray, 7),
            get_el(bitarray, 8),
            get_el(bitarray, 9),
            get_el(bitarray,10));

    return 0;
}

outputs

array[7]=0, array[8]=1, array[9]=1, array[10]=0
Pynchia
  • 10,996
  • 5
  • 34
  • 43
1
typedef struct set
{
    unsigned short var:10; // uint var:1 will be padded to 32 bits
} SET;                     // ushort var:10 (which is max<=1024) padded to 16 bits

As was commented by @Jonathan Leffler use array(unsigned short[]) and define bitmasks

#define bitZer 0x00  //(unsigned)(0 == 0)? true:true;
#define bitOne 0x10  // so from (both inclusive)0-1023 = 1024
...                  // added for clarification  
#define bitTen 0x0A

to look into the bits of each element. http://www.catb.org/esr/structure-packing/ detailed

EpiGen
  • 70
  • 6
  • +4 comments while I was writing – EpiGen Jun 28 '15 at 07:31
  • Write whats wrong in what I wrote please. That is what makes "downvote" very usefull – EpiGen Jun 28 '15 at 08:39
  • It isn't my down-vote, though I'm sorely tempted. You're still using a bit-field, albeit one which can hold a value between 0 and 1023 (so one problem is that you claim it will hold 1024, but it won't). The trouble is, to create a set of up to 1024 such values will require 2048 bytes of memory. This is wasteful compared with 128 bytes which is the minimum that can be used. – Jonathan Leffler Jun 28 '15 at 08:41
  • It simply does not make sense whatsoever. Post code that manages a set with 1024 elements. – Karoly Horvath Jun 28 '15 at 08:41
  • @Jonathan Leffler Really? "As was commented by Jonathan Leffler use array..." but not array of SET-s. So the first code not connected to the second. – EpiGen Jun 28 '15 at 08:59
  • @Karoly Horvath (0-1023) + 1 = (1 - 1024). Tell me if you cant add one to each element on retrieval – EpiGen Jun 28 '15 at 09:05
  • @EpiGen it is hit or miss. Sometimes you will encounter folks with the integrity to support a downvote with a comment to help everyone learn. Other times you get a crowd of "wet-behind-the-ears" kids that like to feel important or superior by downvoting an answer. They care little for the actual learning SO was built to provide, so you should not count on any meaningful help if that is the case... – David C. Rankin Jun 28 '15 at 09:10
  • @EpiGen: that's obvious, and wasn't my point. On 10 bits you can only store 10 bits of information, but you need at least 1024bits. I see no code to retrieve or set members so I can only make wild guesses about how you imagined your code will work... and I don't want to argue against pure speculation. – Karoly Horvath Jun 28 '15 at 09:29
  • At this point you might want to read the other answers, perhaps you can figure out the flaw in your approach without our further assistance. – Karoly Horvath Jun 28 '15 at 09:30
  • @Karoly Horvath Read original post. The problem is to save as more memory as possible while declaring array with type defined as [specified bit width] but NOT to keep 1 - 1024 separate elements inside that structure – EpiGen Jun 28 '15 at 09:49
  • @EpiGen: you're wasting your and my time. All I can say, post code. – Karoly Horvath Jun 28 '15 at 09:56
  • @Karoly Horvath very saad, I understood initial question wrong. Sorry for waste of time, I dont like it too. I will post code for sure, I find it as very interesting problem. – EpiGen Jun 28 '15 at 10:26
1

1) The proper solution for this question is to use Bit Array

The question provided the solution with Bit Fields with Struct. There are two typical ways to save memory space for bits related problem, another is to use Bit Array. For this specific case in the question, the better way is to use Bit Array (demoed as follows).

  • If it is the case like purely independent bit flags here, go for the Bit Array
  • If there is a group of relevant bits , such as the IP address or Control Word definition, then it's better to combine them with a struct, that is to use Bit Fields with Sturct

2) Sample code just for demo Bit Array

#include<limits.h>
#define BITS_OF_INT (sizeof(int)*CHAR_BIT)  
void SetBit(int A[], int k)
     {
       //Set the bit at the k-th position
       A[k/BITS_OF_INT] |= 1 <<(k%BITS_OF_INT);
     } 
void ClearBit(int A[], int k)
     {
       //RESET the bit at the k-th position
       A[k/BITS_OF_INT] &= ~(1 <<(k%BITS_OF_INT)) ;
     }  
int TestBit(int A[], int k)
     {
       // Return TRUE if bit set    
       return ((A[k/BITS_OF_INT] & (1 <<(k%BITS_OF_INT)))!= 0) ;
     }

#define MAX_SIZE 1024
int main()
{
    int A[MAX_SIZE/BITS_OF_INT];
    int i;
    int pos = 100; // position

    for (i = 0; i < MAX_SIZE/BITS_OF_INT; i++)
        A[i] = 0; 

    SetBit(A, pos);
    if (TestBit(A, pos)){//do something}
    ClearBit(A, pos); 
}

3) Furthermore, a worthwhile discussing point from this question is,

How to choose a proper solution between "Bit Array" and "Bit fields with struct"?

Here are some references about this topic.

Community
  • 1
  • 1
Eric Tsui
  • 1,924
  • 12
  • 21
  • You have an array of 256 elements (assuming `sizeof(int) == 4`); you initialize the first 1024 elements of that array. Methinks you have a problem there. Also, you have allocated enough space to store up to 2048 bits. – Jonathan Leffler Jun 28 '15 at 08:49
  • 8 should arguably be `CHAR_BIT` from ``. OTOH, there aren't that many systems where the value of `CHAR_BIT` is not 8 (but there are some, especially in the DSP — digital signal processor — world). – Jonathan Leffler Jun 28 '15 at 08:53
  • Thanks for the reminder. I think the key point here is How to choose a proper solution between "Bit Array" and "Bit fields with struct"? My sample code is just to provide a reference to explain the "Bit Array" itself. I apologise for its flaw. – Eric Tsui Jun 28 '15 at 09:12
  • Don't apologize; we all make mistakes, and sometimes make them in public. Fix it instead. Please! – Jonathan Leffler Jun 28 '15 at 09:15
  • Update answer. Thanks for all advices which help me refine. – Eric Tsui Jun 28 '15 at 09:41
  • I've fix the code which tramples way out of bounds on the array. – Jonathan Leffler Jun 28 '15 at 09:43
  • @JonathanLeffler Thanks for the help. Could you give some more advice for when to use `Bit Array` , when to use `Bit fields struct` by your experience? – Eric Tsui Jun 28 '15 at 10:01
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/81768/discussion-between-eric-tsui-and-jonathan-leffler). – Eric Tsui Jun 28 '15 at 10:02
  • Succinctly: don't use bit-fields unless you're using software that demands you do so. Occasionally, it is useful for mapping things like I/O registers to something usable in C, but such code is inherently non-portable. That's not a big problem; almost all aspects of bit fields are also non-portable. – Jonathan Leffler Jun 28 '15 at 10:02
1

To store a value from 0 to 1023 (or from 1 to 1024, which is essentially the same and only involves adding/subtracting 1) you need a minimum of 10 bits.

This means that for 32-bit (unsigned) integers, you can pack 3 values into 30 bits, which gives 2 bits of useless padding.

Example:

%define ELEMENTS 100

uint32_t myArray[ (ELEMENTS + 2) / 3 ];

void setValue(int n, int value) {
    uint32_t temp;
    uint32_t mask = (1 << 10) - 1;

    if(n >= ELEMENTS) return;
    value--;                        // Convert "1 to 1024" into "0 to 1023"
    temp = myArray[n / 3];
    mask = mask << (n % 3)*10;
    temp = (temp & ~mask) | (value << (n % 3)*10);
    myArray[n / 3] = temp; 
}

int getValue(int n) {
    uint32_t temp;
    uint32_t mask = (1 << 10) - 1;

    if(n >= ELEMENTS) return 0;
    temp = myArray[n / 3];
    temp >>= (n % 3)*10;
    return (temp & ~mask) + 1;
}

You can do this with bitfields instead, but the code to get/set individual values will end up using branches (e.g. switch( n%3 )) which will be slower in practice.

Removing those 2 bits of padding will cost a little more complexity and a little more overhead. For example:

%define ELEMENTS 100

uint32_t myArray[ (ELEMENTS*10 + 31) / 32 ];

int getValue(int n) {
    uint64_t temp;
    uint64_t mask = (1 << 10) - 1;

    if(n >= ELEMENTS) return 0;

    temp = myArray[n*10/32 + 1];
    temp = (temp << 32) | myArray[n*10/32];

    temp >>= (n*10 % 32);

    return (temp & ~mask) + 1;
}

This can't be done with bitfields. This is the most space efficient way to store an array of values that range from 1 to 1024.

Brendan
  • 35,656
  • 2
  • 39
  • 66
  • Maybe five of those useless 2 bit pairs can be filled once after 5 elements. – huseyin tugrul buyukisik Jun 28 '15 at 09:01
  • A set containing values from 1 to 1024 can have up to 1024 distinct values stored. You've not dimensioned your array to hold that many values. You also require the caller to establish storage positions for the numbers, and don't ensure that the values in the set are unique, whereas mathematical sets do not contain any duplicates, by definition. – Jonathan Leffler Jun 28 '15 at 09:04
  • @JonathanLeffler: Argh - I think you're right. I've been doing "array of elements with values 1 to 1024" and not a set in the mathematical sense (which is just a hideously convoluted way to ask for an array of 1024 bits). – Brendan Jun 28 '15 at 09:19
1

If you are storing an "array of booleans" or setting flags, it can be useful. For instance, you can initialize or compare up to 64 values at a time.

These macros will work for unsigned char, short, int, long long ... but simplifies significantly if you just pick a type (so you can use a safer static inline function)

#define getbit(x,n) x[n/(sizeof(*x)*8)]  &  (typeof(*x))1 << (n&((sizeof(*x)*8)-1)) 
#define setbit(x,n) x[n/(sizeof(*x)*8)] |=  (typeof(*x))1 << (n&((sizeof(*x)*8)-1)) 
#define flpbit(x,n) x[n/(sizeof(*x)*8)] ^=  (typeof(*x))1 << (n&((sizeof(*x)*8)-1)) 
#define clrbit(x,n) x[n/(sizeof(*x)*8)] &= ~( (typeof(*x))1 << (n&((sizeof(*x)*8)-1)) ) 

to initialize a large array of booleans all you need to do is: char cbits[]={0,0xF,0,0xFF};

or for all zeroes char cbits[4]={0};

or an int example: int ibits[]={0xF0F0F0F0,~0};

//1111000011110000111100001111000011111111111111111111111111111111

If you will only be accessing 1 type of array, it may be better to make the macros into proper functions like:

static inline unsigned char getbit(unsigned char *x, unsigned n){ 
  return x[n>>3]  &  1 << (n&7); 
}
//etc... similar for other types and functions from macros above

You can also compare multiple flags at a time by '|'ing the flags together and using '&'ed masks; however, it does get a bit more complex when you exceed the native types

For your particular instance you can initialize to all zeroes by:

unsigned char flags[128]={0};

or all 1's by:

uint64_t flags[128] = {~0,~0,~0,~0,~0,~0,~0,~0,~0,~0,~0,~0,~0,~0,~0,~0};

You can even use enums to name your flags

enum{
  WHITE, //0
  RED, //1
  BLUE, //2
  GREEN, //3
  ...
  BLACK //1023
}

if (getbit(flags,WHITE) && getbit(flags,RED) && getbit(flags,BLUE))
  printf("red, white and blue\n");
technosaurus
  • 7,676
  • 1
  • 30
  • 52
  • You can't use those macros with `unsigned long long`; the `1` is an `int` and will sometimes be bit-shifted to oblivion if `sizeof(unsigned long long) > sizeof(int)`. – Jonathan Leffler Jun 28 '15 at 09:07
  • I would choose your answer if it was endianness independent – EpiGen Jun 28 '15 at 09:09
  • @EpiGen: why does endianness matter in a set? – Jonathan Leffler Jun 28 '15 at 09:14
  • @Jonathan Leffler As long as you use bitshifting operatons wrong endianness will lead to data loss,errors. So it is always important no matter on what you use it. – EpiGen Jun 28 '15 at 09:19
  • @JonathanLeffler I removed a `(typeof(x))` cast on the 1 for lack of a portable way... I'm not sure how well the compiler would deal with a `1ULL` there, so for now, I'll leave off the long long. – technosaurus Jun 28 '15 at 09:20
  • @EpiGen: I disagree with that assessment. I think you are wrong. – Jonathan Leffler Jun 28 '15 at 09:21
  • @JonathanLeffler I think the same. Java has built-in endianness dependency (it will handle it for you) check but not C or Cpp. – EpiGen Jun 28 '15 at 09:28
  • @EpiGen I replaced the bulk of the shift operations (most compilers will optimize them back in) so that the only bit shift should be consistent regardless of endianness – technosaurus Jun 28 '15 at 09:38