2

I have a number of bits (the number of bits can change) in an unsigned int (uint32_t). For example (12 bits in the example):

uint32_t a = 0xF9C;

The bits represent a signed int of that length. In this case the number in decimal should be -100. I want to store the variable in a signed variable and gets is actual value. If I just use:

int32_t b = (int32_t)a;

it will be just the value 3996, since it gets casted to (0x00000F9C) but it actually needs to be (0xFFFFFF9C)

I know one way to do it:

union test
{
    signed temp :12;
}; 
union test x;
x.temp = a;
int32_t result = (int32_t) x.temp;

now i get the correct value -100

But is there a better way to do it? My solution is not very flexbile, as I mentioned the number of bits can vary (anything between 1-64bits).

Lundin
  • 195,001
  • 40
  • 254
  • 396
oliverleo
  • 77
  • 9
  • How many bits exactly? – tkausl Mar 21 '19 at 15:49
  • You really need to know how many bits, so you can know the sign. – Eugene Sh. Mar 21 '19 at 15:50
  • 12 bits in this example. But it can change – oliverleo Mar 21 '19 at 15:50
  • Well I always know how many bits the signed variable has. Im implementing a communication protocol. The length of each variable is defined. So I know that this variable has 12 bits, But other variables can have a different length. So I can use the solution i mentioned, but then i need a union for all sizes from 1-64 – oliverleo Mar 21 '19 at 15:51
  • How about checking if the signed bit is set dependent on the amount of bits (shift your mask based on the amount of bits) then invert that mask to select the value of a. Then cast the value to b. and set the signed bit of b based on what you retrieved earlier. – Neijwiert Mar 21 '19 at 15:52

5 Answers5

3

But is there a better way to do it?

Well, depends on what you mean by "better". The example below shows a more flexible way of doing it as the size of the bit field isn't fixed. If your use case requires different bit sizes, you could consider it a "better" way.

unsigned sign_extend(unsigned x, unsigned num_bits)
{
    unsigned f = ~((1 << (num_bits-1)) - 1);
    if (x & f)  x = x | f;
    return x;
}


int main(void)
{
    int x = sign_extend(0xf9c, 12);
    printf("%d\n", x);

    int y = sign_extend(0x79c, 12);
    printf("%d\n", y);
}

Output:

-100
1948
Support Ukraine
  • 42,271
  • 4
  • 38
  • 63
  • With no jump/test: `unsigned sign_extend(unsigned x, unsigned num_bits){unsigned high = 1 << (num_bits-1);x &= (1 << num_bits)-1;return (x ^ high) - high;}` – Nipo Mar 21 '19 at 16:05
1

A branch free way to sign extend a bitfield (Henry S. Warren Jr., CACM v20 n6 June 1977) is this:

// value i of bit-length len is a bitfield to sign extend
// i is right aligned and zero-filled to the left
sext = 1 << (len - 1);
i = (i ^ sext) - sext;

UPDATE based on @Lundin's comment

Here's tested code (prints -100):

#include <stdio.h>
#include <stdint.h>

int32_t sign_extend (uint32_t x, int32_t len)
{
    int32_t i = (x & ((1u << len) - 1)); // or just x if you know there are no extraneous bits
    int32_t sext = 1 << (len - 1);
    return (i ^ sext) - sext;
}

int main(void)
{
    printf("%d\n", sign_extend(0xF9C, 12));
    return 0;
}
Doug Currie
  • 40,708
  • 1
  • 95
  • 119
  • I have no idea who Henry S. Warren Jr is but this doesn't work and gives the wrong result... What's the logic behind `1 << (len - 1)`? Did you actually mean `(1 << len)-1`? – Lundin Mar 22 '19 at 09:23
  • Henry S. Warren Jr is the author of Hacker's Delight, among other things. This works fine, I've used it many times. Perhaps I should have pointed out that `i` and `sext` must have signed types. `1 << (len - 1)` is a mask for the sign bit. – Doug Currie Mar 22 '19 at 15:05
  • Yes, this code is indeed completely different from what you first posted. But this code also invokes undefined behavior in case len is 31 or 32, as one might expect to be valid input for 32 bit unsigned numbers. In addition, it isn't readable. It also runs needlessly slow, compare it for example with the code I just cooked up out of the blue: https://godbolt.org/z/v9-8o5. – Lundin Mar 22 '19 at 15:40
  • Seems you missed my comment on @4386427's answer :). – Nipo Mar 22 '19 at 15:54
  • @Lundin both routines do different things, why compare them ? – Nipo Mar 22 '19 at 15:54
  • @Lundin if the field` len` is 31 or less there is no undefined behavior; if `len` is 32 you need to use `int64_t` for fully defined behavior on non-twos-complement hardware. The last two lines of `sign_extend` is identical to what I posted initially, so I don't get that comment. As to efficiency, it compiles to four machine instructions (load 1, shift, xor, subtract). – Doug Currie Mar 22 '19 at 19:06
  • @Nipo, I did miss your comment on @ 4386427's answer... Thanks for helping promote the truly great hacks! – Doug Currie Mar 22 '19 at 19:14
  • C11 6.5.7/3-4 "If the value of the right operand is negative or is greater than or equal to the width of the promoted left operand, the behavior is undefined." /--/ "The result of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits are filled with zeros." /--/ "If E1 has a signed type and nonnegative value, and E1 × 2^E2 is representable in the result type, then that is the resulting value; otherwise, the behavior is undefined." – Lundin Mar 23 '19 at 15:00
  • @DougCurrie Left shifting msb of an `int32_t` 31 steps is undefined behavior, peiod. int32_t having 31 data bits. As for efficiency it compiles to the disassembly I just linked, stop fantasizing. If you are going to keep this answer then kindly show how the C standard is wrong or how the 12 instructions in the linked `gcc -O3` disassembly is in fact 4. – Lundin Mar 23 '19 at 15:02
  • @Lundin, Please note that I said "if the field` len` is 31 or less there is no undefined behavior" and in that case I only shift a signed value of 1 by 30 bits. The other shift is on an unsigned value. My four instructions comment was about "last two lines of sign_extend is identical to what I posted initially" as in https://godbolt.org/z/sjylN8 – Doug Currie Mar 23 '19 at 16:33
0

This relies on the implementation defined behavior of sign extension when right-shifting signed negative integers. First you shift your unsigned integer all the way left until the sign bit is becoming MSB, then you cast it to signed integer and shift back:

#include <stdio.h>
#include <stdint.h>

#define NUMBER_OF_BITS 12

int main(void) {
    uint32_t x = 0xF9C;
    int32_t y = (int32_t)(x << (32-NUMBER_OF_BITS)) >> (32-NUMBER_OF_BITS);

    printf("%d\n", y);

    return 0;
}
Eugene Sh.
  • 17,802
  • 8
  • 40
  • 61
  • implementation defined means it wont work with all compilers? – oliverleo Mar 21 '19 at 16:07
  • Yes. The sign extension is one of the choices left to the compiler implementation by the C standard. But you should also note, that negative number representation is such a choice as well, so `0xFFFFFF9C` will not be `-100` with every compiler. – Eugene Sh. Mar 21 '19 at 16:09
  • 2
    @EugeneSh. The fixed-length types such as `int32_t` are guaranteed to be two's complement, however. There's no guarantee they will exist, but if they do they'll be two's complement. – Christian Gibbons Mar 21 '19 at 16:16
  • @ChristianGibbons You are right, thanks for pointing out. I always forget this. – Eugene Sh. Mar 21 '19 at 16:17
  • 1
    You don't have to worry if the stdint.h types are available, but writing code that relies on implementation-defined behavior is just bad. In my experience right shifts in particular are non-portable, roughly 50%/50% between arithmetic or logical shift between implementations. – Lundin Mar 22 '19 at 08:45
0

This is a solution to your problem:

int32_t sign_extend(uint32_t x, uint32_t bit_size)
{
    // The expression (0xffffffff << bit_size) will fill the upper bits to sign extend the number.
    // The expression (-(x >> (bit_size-1))) is a mask that will zero the previous expression in case the number was positive (to avoid having an if statemet).
    return (0xffffffff << bit_size) & (-(x >> (bit_size-1))) | x;
}
int main()
{

    printf("%d\n", sign_extend(0xf9c, 12)); // -100
    printf("%d\n", sign_extend(0x7ff, 12)); // 2047

    return 0;
}
Ameen
  • 1,747
  • 3
  • 16
  • 31
-1

The sane, portable and effective way to do this is simply to mask out the data part, then fill up everything else with 0xFF... to get proper 2's complement representation. You need to know is how many bits that are the data part.

  • We can mask out the data with (1u << data_length) - 1.
  • In this case with data_length = 8, the data mask becomes 0xFF. Lets call this data_mask.
  • Thus the data part of the number is a & data_mask.
  • The rest of the number needs to be filled with zeroes. That is, everything not part of the data mask. Simply do ~data_mask to achieve that.
  • C code: a = (a & data_mask) | ~data_mask. Now a is proper 32 bit 2's complement.

Example:

#include <stdio.h>
#include <inttypes.h>

int main(void) 
{
  const uint32_t data_length = 8;
  const uint32_t data_mask = (1u << data_length) - 1;

  uint32_t a = 0xF9C;
  a = (a & data_mask) | ~data_mask;

  printf("%"PRIX32 "\t%"PRIi32, a, (int32_t)a);
}

Output:

FFFFFF9C        -100

This relies on int being 32 bits 2's complement but is otherwise fully portable.

Lundin
  • 195,001
  • 40
  • 254
  • 396
  • 1
    This code does no sign extension, it blindly makes input negative. And `a & data_mask` before oring `~data_mask` is useless. – Nipo Mar 22 '19 at 15:49
  • @Nipo For this whole question to make sense you must already know that 0xF9C is signed 2's complement. You can't know if some random number is signed or not. – Lundin Mar 23 '19 at 14:42
  • @Nipo To spell it out to you, you cannot sign extend a number which is not already in a signed type variable and set as negative. You can't grab some random chunk of binary and "sign extend" it. Shifting data into the sign bits of a signed integer invokes undefined behavior and the program is free to crash and burn. – Lundin Mar 23 '19 at 15:05
  • @Lundin, besides agreeing with Nipo, you can do it, as my solution attests, without undefined nor implementation defined behavior. Your version above invokes implementation defined behavior (see https://stackoverflow.com/questions/9498190/is-conversion-from-unsigned-to-signed-undefined ) – Doug Currie Mar 23 '19 at 19:22
  • 1
    You seem to forget a signed number may actually be positive. If we sign extend 0x2a from 7 bits to 32, I actually expect to get 0x0000002a, not 0xffffffaa. The latter is what we'll get from your code. – Nipo Mar 23 '19 at 20:10
  • @DougCurrie In practice every mainstream 2's complement implementation will perform the conversion well-defined so what's your point. – Lundin Mar 24 '19 at 11:48
  • @Nipo No. The term "sign extension" refers to a negative number for example `int8_t` with binary value 0xFF retaining its sign when promoted to a larger type like `int16_t`, 0xFFFF. Nothing else. You cannot "sign extend" positive numbers. And yet again, you _cannot_ left shift bits into the sign bit of a signed variable because that invokes undefined behavior. – Lundin Mar 24 '19 at 11:53
  • 1
    @Lundin, search Wikipedia for "sign extension.' The second paragraph is: For example, if six bits are used to represent the number "00 1010" (decimal positive 10) and the sign extend operation increases the word length to 16 bits, then the new representation is simply "0000 0000 0000 1010". Thus, both the value and the fact that the value was positive are maintained. – Doug Currie Mar 24 '19 at 14:08
  • @Lundin: If one defines the binary representation mathematically as describing the set of non-negative integer powers of two that must be summed together to yield a value that will be congruent to that number mod every possible power-of-two base, then every integer starts with an infinite string of ones or zeroes followed by a finite string of ones and zeroes. Sign extension is simply the practice of saying that all the digits to the left of a particular representation match the leftmost digit shown, so as to avoid having to actually store them. – supercat Mar 25 '19 at 15:27