4

I'm unsure if this is possible due to structure padding and alignment but, assuming you take care of that by aligning your structures to 4/8 bytes, is it possible to bit shift on a structure as if it was a single variable?

What I'd like to do is take a string (max 8 bytes) and shift it into the high order bits of a 64-bit variable.

Like if I do this:

#include <stdint.h>
#include <string.h>
void shiftstr(uint64_t* t,char* c,size_t len){
    memcpy(t, c, len);
    //now *t==0x000000617369616b
    *t<<=(sizeof(uint64_t)-len)*8;
    //now *t==0x617369616b000000
}
int main(){
    uint64_t k = 0;
    char n[] = "kaisa";
    shiftstr(&k, n,strlen(n));
    return 0;
}

This works just fine, but what if I had, instead of a uint64_t, two uint32_t, either as individual variables or a structure.

#include <stdint.h>
#include <string.h>
struct U64{
    uint32_t x;
    uint32_t y;
};
void shiftstrstruct(struct U64* t, char* c, size_t len){
    memcpy(t, c, len);
    /*
     At this point I think
     x == 0x7369616b
     y == 0x00000061
     But I could be wrong
     */
    //but how can I perform the bit shift?
    //Where
    //x==0x0000006b
    //y==0x61697361
    
}
int main(){
    char n[] = "kaisa";
    struct U64 m = {0};
    shiftstrstruct(&m, n, strlen(n));
    return 0;
}

Up to the memcpy part, it should be the same as if I were performing it on a single variable. I believe the values of x and y are correct in such situations. But, if that's the case that means the values need to be shifted away from x towards y.

I know I can cast but what if I wanted to deal with a 16 byte string that needed to be shifted into two 64 bit variables, or even larger?

Is shifting structures like this possible? Is there a better alternative?

Adrian Mole
  • 49,934
  • 160
  • 51
  • 83
Kayla
  • 223
  • 2
  • 14
  • 1
    Better alternative to bitshifting entire structs? Most definitely. But whats the usecase that requires you to bitshift structs in the first place? What problem is this trying to solve? – Shark Mar 30 '22 at 11:25
  • 3
    While unlikely in your specific structure, you should still be aware of [Why isn't sizeof for a struct equal to the sum of sizeof of each member?](https://stackoverflow.com/questions/119123/why-isnt-sizeof-for-a-struct-equal-to-the-sum-of-sizeof-of-each-member) – Some programmer dude Mar 30 '22 at 11:26
  • 3
    Keep in mind that different endianness between implementations means that this isn't something you can rely on. – Thomas Jager Mar 30 '22 at 11:27
  • @Shark The problem is exactly as I posted. I'm trying to shift a string into the high order bits of a say a uint64, and that's fine for a uint64 but what if I want to instead use two uint32 or two uint64 for a 16 byte string. – Kayla Mar 30 '22 at 11:27
  • @Someprogrammerdude That is something I took into account yes, I am aware that padding can change the size of the structure in the case that it wasn't alligned. But I would check ahead of time if the size was the actual size. – Kayla Mar 30 '22 at 11:28
  • Also, you think that `x == 0x00000061` and `y == 0x7369616b`. That's wrong. The first four bytes of the string will be copied to the first four bytes of the structure. That is the value of `x` will be `"kais"` (in some endian-specific order). – Some programmer dude Mar 30 '22 at 11:29
  • @Someprogrammerdude oh that was my mistake, I had that backwards sorry. – Kayla Mar 30 '22 at 11:30
  • As a workaround, perhaps use a `union` with your structure and an `uint64_t` variable? Then you can shift the `uint64_t` variable like you do in the first example. Type-punning like that is (or at leased used to be) valid in C. – Some programmer dude Mar 30 '22 at 11:40
  • @Someprogrammerdude Yes, but without an implemented `int128_t`, that won't work for the 16-byte case the OP mentions. – Adrian Mole Mar 30 '22 at 11:40
  • @AdrianMole That's what I'm wondering since casting can work but if the variable type isn't avaiable then I can't really do that. Because I believe I can cast a structure to a uint64_t then shift on it as if it were a single variable right? But in the casae I don't have access to a uint128 then it becomes invalid. – Kayla Mar 30 '22 at 11:43
  • 2
    Type-punning by casting (a pointer to the) structure to a (pointer to a)``uint64_t` breaks strict aliasing and leads to undefined behavior. You either have to go through a `char` array, or use unions (as long as your structure-size have a corresponding single standard type (like two `uint32_t` can correspond to a single `uint64_t`). – Some programmer dude Mar 30 '22 at 11:47
  • If your compiler doesn't compile Adrian's answer optimally for ISAs you care about (like x86-64 `shld` + `shl` plus extra work to handle counts >= 64 (https://godbolt.org/z/fjK7Msaro), not a bunch of regular shifts / ORs and `neg` on the count), you'll want to memcpy into GNU C `unsigned __int128` if it's available (64-bit targets only). Or if your shift count is always a multiple of 8 (a byte-shift) use SSSE3 `pshufb` (`_mm_shuffle_epi8`) with a shuffle control loaded from a 16-byte window of an array like `{ -1, -1, -1, 0, 1, 2, ..., 15, -1, -1, -1 }`, or something like that. – Peter Cordes Apr 05 '22 at 09:17

2 Answers2

5

Is shifting structures like this possible?

No, not really. Even if the x and y members are in adjacent memory locations, bit-shift operations on either are performed as integer operations on the individual variables. So, you can't shift bits "out of" one and "into" the other: bits that "fall off" during the shift will be lost.

Is there a better alternative?

You would have to implement such a multi-component bit-shift yourself – making copies of the bits that would otherwise be lost and somehow masking those back into the result, after shifting other bits internally to each 'component' variable. Exactly how to do this would largely depend on the use case.

Here's one possible implementation of a right-shift function for a structure comprising two uint64_t members (I have not added any error-checking for the count, and I assume that uint64_t is exactly 64 bits wide):

#include <stdio.h>
#include <stdint.h>

typedef struct {
    uint64_t hi;
    uint64_t lo;
} ui128;

void Rshift(ui128* data, int count)
{
    uint64_t mask = (1uLL << count) - 1; // Set low "count" bits to 1
    uint64_t save = data->hi & mask;     // Save bits that fall off hi
    data->hi >>= count;                  // Shift the hi component
    data->lo >>= count;                  // Shift the lo component
    data->lo |= save << (64 - count);    // Mask in the bits from hi
    return;
}

int main()
{
    ui128 test = { 0xF001F002F003F004, 0xF005F006F007F008 };
    printf("%016llx%016llx\n", test.hi, test.lo);
    Rshift(&test, 16);
    printf("%016llx%016llx\n", test.hi, test.lo);
    return 0;
}

A similar logic could be used for a left-shift function, but you would then need to save the relevant upper (most significant) bits from the lo member and mask them into the shifted hi value:

void Lshift(ui128* data, int count)
{
    uint64_t mask = ((1uLL << count) - 1) << (64 - count);
    uint64_t save = data->lo & mask;
    data->hi <<= count;
    data->lo <<= count;
    data->hi |= save >> (64 - count);
    return;
}
Adrian Mole
  • 49,934
  • 160
  • 51
  • 83
  • So I would need to account for how many times a shift occurs, then use my own algorithm to decide which variables the shift needs to happen on... – Kayla Mar 30 '22 at 11:33
  • @Kayla. Yes. By "use case", one of the issues would be whether or not the shift count would always be the same (makes things a bit easier) or could be different each time (would need a loop that checks the 'end' bit before the shift and mask that back afterwards). – Adrian Mole Mar 30 '22 at 11:37
  • @Kayla The source code for an implementation of an `int128_t` would be a good place to look for inspiration. Like how that handles `>>` and `<<` for the type. – Adrian Mole Mar 30 '22 at 11:39
  • I don't think this is safe for shift counts > 64. That needs to shift all the bits out of `data->lo` (for left shift) leaving it zero. GCC's and clang's `unsigned __int128` left shift by a variable uses two `cmov` instructions. https://godbolt.org/z/fjK7Msaro That's wasted work if your count is always strictly less than 64, @Kayla, otherwise it's necessary. – Peter Cordes Apr 05 '22 at 09:16
  • @PeterCordes It's certainly *not* safe for shift counts > 64. The `64 - count` as the RH operand of `<<` or `>>` [would be dodgy, at best](https://stackoverflow.com/a/4945765/10871073). – Adrian Mole Apr 05 '22 at 09:26
-1

union is your friend, this is what you want:

#include <stdint.h>
#include <stdio.h>

typedef union _shift_u64{
    struct _u64{
        uint32_t x;
        uint32_t y;
    } __attribute__((__packed__)) U64;
    uint64_t x_and_y;
} SHIFT_U64;

int main(int argc, char* argv[]){
    SHIFT_U64 test;
    
    test.U64.x = 4;
    test.U64.y = 8;
    printf("test.U64.x=%d, test.U64.y=%d, test.x_and_y=%ld\n", test.U64.x, test.U64.y, test.x_and_y);

    test.x_and_y<<=1;
    printf("test.U64.x=%d, test.U64.y=%d, test.x_and_y=%ld\n", test.U64.x, test.U64.y, test.x_and_y);

    test.x_and_y>>=1;
    printf("test.U64.x=%d, test.U64.y=%d, test.x_and_y=%ld\n", test.U64.x, test.U64.y, test.x_and_y);

    return 0;
}

EDIT: This simple program illustrates how to do it the other way, but you have to check for the carry over bit and shift overflow and shift underflow by yourself. union doesn't care about the data, you just have to make sure that the data makes sense. After compiling, redirect the output of the program to a file or hex-editor and read the errorlevel of the program.

Linux example: ./a.out > a.out.bin; echo "errorlevel=$?"; xxd a.out.bin

#include <stdio.h>

typedef union _shift_it{
    struct _data{
        unsigned long x : 64;
        unsigned long y : 64;
    } __attribute__((__packed__)) DATA;
    unsigned char x_and_y[16];
} __attribute__((__packed__)) SHIFT_IT;

int main(int argc, char* argv[]){
    SHIFT_IT test;
    int errorlevel = 0;

    //bitmask for shift operation
    static const unsigned long LEFT_SHIFTMASK64 = 0x8000000000000000;
    static const unsigned long RIGHT_SHIFTMASK64 = 0x0000000000000001;

    //test data
    test.DATA.x = 0x2468246824682468; //high bits
    test.DATA.y = 0x1357135713571357; //low bits

    //binary output to stdout
    for(int i=0; i<16; i++) putchar(test.x_and_y[i]);

    //left shift
    if(test.DATA.x & LEFT_SHIFTMASK64) errorlevel += 1;
    test.DATA.x <<= 1;
    if(test.DATA.y & LEFT_SHIFTMASK64) errorlevel += 2;
    test.DATA.y <<= 1;

    //binary output to stdout
    for(int i=0; i<16; i++) putchar(test.x_and_y[i]);

    //right shift
    if(test.DATA.y & RIGHT_SHIFTMASK64) errorlevel += 4;
    test.DATA.y >>= 1;
    if(test.DATA.x & RIGHT_SHIFTMASK64) errorlevel += 8;
    test.DATA.x >>= 1;

    //binary output to stdout
    for(int i=0; i<16; i++) putchar(test.x_and_y[i]);

    //right shift
    if(test.DATA.y & RIGHT_SHIFTMASK64) errorlevel += 16;
    test.DATA.y >>= 1;
    if(test.DATA.x & RIGHT_SHIFTMASK64) errorlevel += 32;
    test.DATA.x >>= 1;

    //binary output to stdout
    for(int i=0; i<16; i++) putchar(test.x_and_y[i]);

    //left shift
    if(test.DATA.x & LEFT_SHIFTMASK64) errorlevel += 64;
    test.DATA.x <<= 1;
    if(test.DATA.y & LEFT_SHIFTMASK64) errorlevel += 128;
    test.DATA.y <<= 1;

    //binary output to stdout
    for(int i=0; i<16; i++) putchar(test.x_and_y[i]);

    return errorlevel;
}
paladin
  • 765
  • 6
  • 13
  • However: *I know I can cast but what if I wanted to deal with a 16 byte string that needed to be shifted into two 64 bit variables, or even larger?* How would a `union` deal with that case? – Adrian Mole Mar 30 '22 at 11:59
  • @AdrianMole I've edited my post, I hope this may inspire you. – paladin Mar 30 '22 at 13:59
  • @AdrianMole: Since this answer's already using GNU C extensions, the same union trick with `unsigned __int128` would be the way to go if you're only compiling for 64-bit targets, perhaps giving better asm than doing it manually with `<>(64-count)` or `>>(-count & 63)`. The current answer generating an `errorlevel` instead of actually propagating carries is not very good, especially for variable-count shifts, but the union is good for 8-byte structs. (Although no better than the question's memcpy into `uint64_t` and back; didn't notice earlier they already solved that.) – Peter Cordes Apr 05 '22 at 09:02