5

Say I have a signed number coded on an unusual number of bits, for instance 12. How do I convert it efficiently to a standard C value ? The following works but requires an intermediate variable:

#include <stdio.h>
int main() { 
  unsigned short U12=0xFFF; // 12-bit signed number, as coded in hex
  unsigned short XT=U12<<4; // 16 bits minus 12 is 4...
  short SX=(*(short*)&XT)>>4; // Signed shift. Is that standard C ?
  printf("%08X %d\n", SX, SX);
  return 0;
}

Ouput:

U12=0x0:   00000000 0
U12=0x1:   00000001 1
U12=0x7FF: 000007FF 2047
U12=0x800: FFFFF800 -2048
U12=0x801: FFFFF801 -2047
U12=0xFFF: FFFFFFFF -1

Is there a more direct way to do this without intermediate variable?

dargaud
  • 2,431
  • 2
  • 26
  • 39

4 Answers4

7

Here's one portable way (untested):

 short SX = (short)U12 - ((U12 & 0x800) << 1);

(Replace 0x800 and 0x1000 with (1<<whatever) for a different number of bits).

Methods that directly manipulate the sign bit using bit shift operations tend to invoke UB.

Eric Postpischil
  • 195,579
  • 13
  • 168
  • 312
n. m. could be an AI
  • 112,515
  • 14
  • 128
  • 243
4

One problem with your approach is that you seem to be assuming that the width of a short and unsigned short is 16 bits. However, that is not guaranteed by the ISO C standard. The standard merely specifies that the width must be at least 16 bits.

If you want to use a data type that is guaranteed to be 16 bits wide, then you should use the data types uint16_t and int16_t instead.

Another problem is that your code is assuming that the platform your program is running on represents integers with negative values the same way as your "12-bit signed number". However, in contrast to the latest ISO C++ standard, the ISO C standard allows platforms to use any of the following representations:

  1. two's complement
  2. ones' complement
  3. signed magnitude

Assuming that your "12-bit signed number" has a two's complement representation, but your want your code to run correctly on all platforms, irrespective of which representation is used internally by the platform, then you would have to write code such as the following:

#include <stdio.h>
#include <stdint.h>
#include <inttypes.h>
#include <stdbool.h>

int main( void )
{
    uint16_t U12=0xFFF;
    int16_t converted;

    //determine sign bit
    bool sign = U12 & 0x800;

    if ( sign )
    {
        //value is negative
        converted = -2048 + ( U12 & 0x7FF );
    }
    else
    {
        //value is positive
        converted = U12;
    }

    printf( "The converted value is: %"PRId16"\n", converted );
}

This program has the following output:

The converted value is: -1

As pointed out in the comments section, this program can be simplified to the following:

#include <stdio.h>
#include <stdint.h>
#include <inttypes.h>

int main( void )
{
    uint16_t U12=0xFFF;

    int16_t converted = U12 - ( (U12 & 0x800) << 1 );

    printf( "The converted value is: %"PRId16"\n", converted );
}
Andreas Wenzel
  • 22,760
  • 4
  • 24
  • 39
  • Shouldnt it be `-int16_t((~U12)&0x7FF+1)` in the case of negative? –  Jan 21 '22 at 14:21
  • Why `converted = -2048 + ( U12 & 0x7FF );` instead of simply `converted = U12 - 4096;`? And hence `converted = U12 - (sign ? 4096 : 0);`. And hence `converted = U12 - ((U12 & 0x800) << 1);`? – Eric Postpischil Jan 21 '22 at 14:24
  • It's more relevant what signedness format the original 12 bit number got than the CPU. If it's from an ADC or other sensor, it isn't uncommon with things like signed magnitude. – Lundin Jan 21 '22 at 14:57
  • @EricPostpischil: `-2048` is simply the value of the sign bit and `U12 & 0x7FF` is the value of `U12` with the sign bit masked out. You are right that the program can be optimized in the manner you describe. I have therefore edited my answer to refer to your comment. – Andreas Wenzel Jan 21 '22 at 14:59
  • @Jellyboy: That is C++, not valid C, and also does not provide the correct result. Beware that `+` has a higher operator precedance than `&`. After fixing the operator precedence, I believe that your solution is also correct. – Andreas Wenzel Jan 21 '22 at 15:48
  • @AndreasWenzel Your solution does not work. https://godbolt.org/z/n7xrK9Yaj The expression has to be `-((int16_t)(((~n)&0x7FF)+1))` for negative numbers. –  Jan 21 '22 at 15:50
  • @Jellyboy: I don't see anything wrong with my solution (which is `converted` in your code). As far as I can tell, it is your solution which is wrong (which is `converted2` in your code). Your solution does not work for positive numbers. In your previous comment, you stated that your solution was only intended for negative numbers, and, as far as I can tell, it works correctly for them. – Andreas Wenzel Jan 21 '22 at 16:06
  • @AndreasWenzel Your solution is giving 2047 when it should yield -1. How isnt that wrong? If you care to read I said explicitly "FOR NEGATIVE NUMBERS ONLY" in the text. –  Jan 21 '22 at 16:29
  • @Jellyboy: For `n == 0x7FF`, which is `2047` in decimal, my solution is giving the correct answer `2047`, whereas your solution is giving the incorrect answer `-1`. We are talking about 12-bit numbers in two's complement, so they can only represent values in the range `-2048` to `+2047`. `n == 0x7FF` represents a positive number. – Andreas Wenzel Jan 21 '22 at 16:36
  • @AndreasWenzel Op explicitly asked "How do I convert it efficiently to a standard C value ? " and 2047 is not -1 in standard C, it is -1 only in his original representation. Therefore your answer is incorrect. –  Jan 21 '22 at 16:39
  • @Jellyboy: I believe that you may be confusing the value `0xFFF` (which OP is using in the question and I am using in my answer), with the value `0x7FF` (which you are using in your linked program). The value `0xFFF` represents `-1` in 12-bit two's complement representation, whereas the value `0x7FF` represents `2047`. – Andreas Wenzel Jan 21 '22 at 16:43
  • Bitfield alergy is amazing here on SO – 0___________ Jan 21 '22 at 16:46
  • 1
    @AndreasWenzel That is true, I mistakenly took 0x7FF for 0xFFF. –  Jan 22 '22 at 15:10
  • @0___________: Using a bitfield will only solve the width problem. It will not solve the problem of the 3 different possible ways of representing negative values that are permissive in C. Your bitfield solution will only work if the platform you are using uses the same representation as the representation used for OP's "12-bit signed number". – Andreas Wenzel Jan 22 '22 at 15:45
  • @AndreasWenzel OP wrote: `just assume the encoding is the same as the processor's` – 0___________ Jan 22 '22 at 16:00
  • @0___________: OP did not include that information in the question, and OP only posted that comment more than two hours after I had posted my answer. However, you are right that your bitfield solution is probably best for what OP is asking for now. – Andreas Wenzel Jan 22 '22 at 16:11
4

Assuming that 0xFFF is a 12 bit number expressed with two's complement representation (not necessarily the case), that is equivalent to -1 and assuming that our CPU also uses 2's complement (extremely likely), then:

Using small integer types such as (unsigned) char or short inside bitwise operations is dangerous, because of implicit type promotion. Assuming 32 bit system with 16 bit short, if such a variable (signed or unsigned) is handed to the left operand of a shift, it will always get promoted to (signed) int.

Under the above assumptions, then:

  • U12<<4 gives a result 0xFFF0 of type int which is signed. You then convert it to unsigned short upon assignment.
  • The conversion *(short*)&XT is smelly but allowed by the pointer aliasing rules in C. The contents of the memory is now re-interpreted as the CPU's signed format.
  • the_signed_short >> 4 invokes implementation-defined behavior when the left operator is negative. It does not necessarily result in an arithmetic shift as you expect. It could as well be a logical shift.
  • %X and %d expect unsigned int and int respectively, so passing a short is wrong. Here you get saved by the mandatory default promotion of a variadic function's argument, in this case a promotion to int, again.

So overall there's a lot of code smell here.

A better and mostly well-defined way to do this on the mentioned 32 bit system is this:

int32_t u12_to_i32 (uint32_t u12)
{
  u12 &= 0xFFF; // optionally, mask out potential clutter in upper bytes

  if(u12 & (1u<<11)) // if signed, bit 11 set?
  {
    u12 |= 0xFFFFFFu << 12; // "sign extend"
  }

  return u12; // unsigned to signed conversion, impl.defined  
}

All bit manipulations here are done on an unsigned type which will not get silently promoted on a 32 bit system. This method also have the advantage of using pure hex bit masks and no "magic numbers".

Complete example with test cases:

#include <stdint.h>
#include <stdio.h>
#include <inttypes.h>

int32_t u12_to_i32 (uint32_t u12)
{
    u12 &= 0xFFF; // optionally, mask out potential clutter in upper bytes

    if(u12 & (1u<<11)) // if signed, bit 11 set?
    {
    u12 |= 0xFFFFFFu << 12; // "sign extend"
    }

    return u12; // unsigned to signed conversion, impl.defined  
}

int main (void) 
{ 
  uint32_t u12;
  int32_t  i32;
  
  u12=0; i32 = u12_to_i32(u12);
  printf("%08"PRIX32 "-> %08"PRIX32 " = %"PRIi32 "\n", u12, (uint32_t)i32, i32);

  u12=0x7FF; i32 = u12_to_i32(u12);
  printf("%08"PRIX32 "-> %08"PRIX32 " = %"PRIi32 "\n", u12, (uint32_t)i32, i32);

  u12=0x800; i32 = u12_to_i32(u12);
  printf("%08"PRIX32 "-> %08"PRIX32 " = %"PRIi32 "\n", u12, (uint32_t)i32, i32);

  u12=0xFFF; i32 = u12_to_i32(u12);
  printf("%08"PRIX32 "-> %08"PRIX32 " = %"PRIi32 "\n", u12, (uint32_t)i32, i32);

  return 0;
}

Output (gcc x86_64 Linux):

00000000-> 00000000 = 0
000007FF-> 000007FF = 2047
00000800-> FFFFF800 = -2048
00000FFF-> FFFFFFFF = -1
Lundin
  • 195,001
  • 40
  • 254
  • 396
3

Let the compiler do the hard work:

#define TOINT(val, bits) (((struct {int val: bits;}){val}).val)

or more general

#define TOINT(type, v, bits) (((struct {type val: bits;}){v}).val)

usage:

int main(void)
{
    int int12bit = 0xfff;

    printf("%d\n", TOINT(int, int12bit, 12));
}

or the more simple version:

int main(void)
{
    int int12bit = 0x9B2;

    printf("%d\n", TOINT(int12bit, 12));
}

and the compiler will choose the most efficient method for your target platform and target type:

int convert12(int val)
{
    return TOINT(val,12);
}

long long convert12ll(unsigned val)
{
    return TOINT(long long, val,12);
}

https://godbolt.org/z/P4b4rM4TT

Similar way you can convert the N bits signed integer to 'M' bits signed integer

#define TOINT(v, bits) (((struct {long long val: bits;}){v}).val)
#define FROM_N_TO_M(val, N, M) TOINT(TOINT(val, N), M)
0___________
  • 60,014
  • 4
  • 34
  • 74
  • I'm only now just discovering the power of bitfields ! Thanks. – dargaud Jan 21 '22 at 16:41
  • 1
    It's not well-defined what will happen if you assign 0xFFF to a bit-field of size 12, nor is it well-defined what types that are supported other than int, char and bool. Nor is the endianess or the bit order well-defined. And still as of C17, binary integer constants are non-standard. Overall you are making a whole lot of assumptions about a very specific compiler and system being used. I don't really see any improvement over the original code which also contained lots of implementation-defined behavior, though not quite this much. – Lundin Jan 21 '22 at 21:31
  • 1
    For example care to explain the clang warning from your godbolt example: "format specifies type 'long long' but the argument has type 'int' [-Wformat]" – Lundin Jan 21 '22 at 21:36
  • @Lundin In real life using the popular toolchains (gcc/binutils, keil, IAR, msvc, GHC, clang and many others) all is well defined. I agree - language lawyer level it is not, but bitfields are very common in the embedded systems programming and compiler developers know that they have to be consistent in the next versions / other targets implementations. `0b` indeed is gcc specific - replaced with hex value., – 0___________ Jan 22 '22 at 20:54