1

I have a long array of char (coming from a raster file via GDAL), all composed of 0 and 1. To compact the data, I want to convert it to an array of bits (thus dividing the size by 8), 4 bytes at a time, writing the result to a different file. This is what I have come up with by now:

uint32_t bytes2bits(char b[33]) {
    b[32] = 0;
    return strtoul(b,0,2);
}

const char data[36] = "00000000000000000000000010000000101"; // 101 is to be ignored
char word[33];
strncpy(word,data,32);
uint32_t byte = bytes2bits(word);
printf("Data: %d\n",byte); // 128

The code is working, and the result is going to be written in a separate file. What I'd like to know is: can I do that without copying the characters to a new array?

EDIT: I'm using a const variable here just to make a minimal, reproducible example. In my program it's a char *, which is continually changing value inside a loop.

Rodrigo
  • 4,706
  • 6
  • 51
  • 94

4 Answers4

2

Yes, you can, as long as you can modify the source string (in your example code you can't because it is a constant, but I assume in reality you have the string in writable memory):

uint32_t bytes2bits(const char* b) {
    return strtoul(b,0,2);
}

void compress (char* data) { 
    // You would need to make sure that the `data` argument always has 
    // at least 33 characters in length (the null terminator at the end 
    // of the original string counts)
    char temp = data[32];
    data[32] = 0;
    uint32_t byte = bytes2bits(data);
    data[32] = temp;
    printf("Data: %d\n",byte); // 128
}
lxop
  • 7,596
  • 3
  • 27
  • 42
1

In this example by using char* as a buffer to store that long data there is not necessary to copy all parts into a temporary buffer to convert it to a long. Just use a variable to step through the buffer by each 32 byte length period, but after the 32th byte there needs the 0 termination byte.

So your code would look like:

uint32_t bytes2bits(const char* b) {
    return strtoul(b,0,2);
}

void compress (char* data) { 
    int dataLen = strlen(data);
    int periodLen = 32;
    char* periodStr;
    char tmp;
    int periodPos = periodLen+1;
    uint32_t byte;

    periodStr = data[0];
    while(periodPos < dataLen)
    {
    tmp = data[periodPos];
    data[periodPos] = 0;

    byte = bytes2bits(periodStr);
    printf("Data: %d\n",byte); // 128

    data[periodPos] = tmp;
    periodStr = data[periodPos];
    periodPos += periodLen;
    }
    if(periodPos - periodLen <= dataLen)
    {
        byte = bytes2bits(periodStr);
        printf("Data: %d\n",byte); // 128
    }
}

Please than be careful to the last period, which could be smaller than 32 bytes.

Aak
  • 182
  • 9
0

const char data[36]

You are in violation of your contract with the compiler if you declare something as const and then modify it.

Generally speaking, the compiler won't let you modify it...so to even try to do so with a const declaration you'd have to cast it (but don't)

char *sneaky_ptr = (char*)data;
sneaky_ptr[0] = 'U'; /* the U is for "undefined behavior" */

See: Can we change the value of an object defined with const through pointers?

So if you wanted to do this, you'd have to be sure the data was legitimately non-const.

  • I only declared it as const to make a minimal, reproducible example. In my program it isn't const. – Rodrigo Dec 01 '18 at 14:33
  • @Rodrigo I misread the docs for strtoul. If you want to in-place convert an arbitrarily long buffer of 1s and 0s to bytes, you can do so...but you wouldn't want to use strtoul to do that. – HostileFork says dont trust SE Dec 01 '18 at 14:43
  • Sorry, my bad. I explained it better in the text now. It's no in-place conversion. The converted values are going to a different file. – Rodrigo Dec 01 '18 at 14:44
0

The right way to do this in modern C++ is by using std::string to hold your string and std::string_view to process parts of that string without copying it.

You can using string_view with that char array you have though. It's common to use it to modernize the classical null-terminated string const char*.

The Quantum Physicist
  • 24,987
  • 19
  • 103
  • 189
  • Please take a look at lxop's answer. I think it's more efficient, isn't it? – Rodrigo Dec 01 '18 at 14:50
  • @Rodrigo No, it's the same. Except that it uses old methods. – The Quantum Physicist Dec 01 '18 at 14:51
  • The same program done in C++ is usually bigger than in C. That's why I thought it could be different. I don't know the internal implementation of std::string, much less string_view. So I'd have to take your word for it. – Rodrigo Dec 01 '18 at 15:23