3

I'd like to define a C macro

#define TO_UNSIGNED(x) (...)

, which takes a signed integer x (can be: signed char, short, int, long, long long, or anything else, even something longer than a long long), and it converts x to the corresponding unsigned integer type of the same size.

It's OK to assume that signed integers use the two's complement representation. So to convert any value (positive or negative), its two's complement binary representation should be taken, and that should be interpreted as an unsigned integer of the same size.

I'm assuming that a reasonably modern, optimizing compiler is used which can eliminate unused branches, e.g. if sizeof(X) < 4 ? f(Y) : g(Z) is executed, then X is not evaluated, and only one of f(Y) or g(Z) is generated and evaluated.

pts
  • 80,836
  • 20
  • 110
  • 183
  • What if `x` is negative? – Bart Friederichs Jan 01 '14 at 15:31
  • How would you handle negative integers? – Eimantas Jan 01 '14 at 15:32
  • The two's complement binary representations of `x` should be taken, and it should be interpreted as an unsigned integer of the same size. – pts Jan 01 '14 at 15:33
  • I assume he would throw an exception / error – Spikeh Jan 01 '14 at 15:34
  • @Spikeh: No erros thrown. I've updated the question. – pts Jan 01 '14 at 15:35
  • Can't you use a switch-case based on sizeof(x), and cast accordingly? – Leeor Jan 01 '14 at 15:37
  • @Leeor: I don't think it's possible. How would you use a switch-case in a macro? And how would you return values of different types? – pts Jan 01 '14 at 15:39
  • @pts, ok, added an answer. Used ternary operator instead – Leeor Jan 01 '14 at 15:57
  • 1
    I don't believe it's possible. You don't have access to type information in the preprocessor, and `sizeof` isn't available either. – Paul Hankin Jan 01 '14 at 16:21
  • Given the difficulties, consider supplying usage examples. By taking advantage of your higher level goal, a solution may be discerned. – chux - Reinstate Monica Jan 01 '14 at 17:35
  • @Anonymous: I also don't think it's possible, but your explanation is incorrect. `sizeof` is indeed possible. There is no need to do the computation in the preprocessor. The reason why I asked for a macro is so that the type of the output can depend on the type of the input. – pts Jan 01 '14 at 19:42
  • 1
    @chux: My first use case is `#define ADD_WRAP(x, y) ((typeof(x))(TO_UNSIGNED(x) + TO_UNSIGNED(y)))`, which is similar to `#define ADD_WRAP(x, y) ((x) + (y))` with `gcc -fwrapv`, i.e. the same integer wraparound as for unsigned types. – pts Jan 01 '14 at 20:06
  • 1
    @pts: I think that in this particular case, it is important that `sizeof` is evaluated by the compiler after preprocessing. If the preprocessor could evaluate `sizeof` to a decimal number it would be possible to convert `TO_UNSIGNED(x)` to `TO_UINT1(x)`, `TO_UINT2(x)`and so on with the concatenation operator `##`. Solutions that use `sizeof` and rely on an optimizing compiler that remvoes dead branches of constant false conditions don't meet your type requirement. Anonymus is right: `sizeof` isn't available - in the preprocessor. – M Oehm Jan 02 '14 at 07:54
  • @MOehm: Thanks for the insights about sizeof and the optimizing compiler. I've extended the question to explicitly allow this. – pts Jan 02 '14 at 10:06

4 Answers4

7

I'll bite, but I have to say it's more in the spirit of macro hacking, not because I think such a macro is useful. Here goes:

#include <stdlib.h>
#include <stdio.h>

#define TO_UNSIGNED(x) (                                            \
    (sizeof(x) == 1)                ? (unsigned char) (x) :         \
    (sizeof(x) == sizeof(short))    ? (unsigned short) (x) :        \
    (sizeof(x) == sizeof(int))      ? (unsigned int) (x) :          \
    (sizeof(x) == sizeof(long))     ? (unsigned long) (x) :         \
                                      (unsigned long long) (x)      \
    )

// Now put the macro to use ...

short minus_one_s()
{
    return -1;
}

long long minus_one_ll()
{
    return -1LL;
}

int main()
{
    signed char c = -1;
    short s = -1;
    int i = -1;
    long int l = -1L;
    long long int ll = -1LL;

    printf("%llx\n", (unsigned long long) TO_UNSIGNED(c));
    printf("%llx\n", (unsigned long long) TO_UNSIGNED(s));
    printf("%llx\n", (unsigned long long) TO_UNSIGNED(i));
    printf("%llx\n", (unsigned long long) TO_UNSIGNED(l));
    printf("%llx\n", (unsigned long long) TO_UNSIGNED(ll));

    printf("%llx\n", (unsigned long long) TO_UNSIGNED(minus_one_s()));
    printf("%llx\n", (unsigned long long) TO_UNSIGNED(minus_one_ll()));

    return 0;
}

The macro uses the ternary comparison operator ?: to emulate a switch statement for all known signed integer sizes. (This should catch the appropriate unsigned integers and the typedef'd typed from <stdint.h>, too. It works with expressions. It also accepts floats, although not quite as I'd expect.)

The somewhat convoluted printfs show that the negative numbers are expanded to the native size of the source integer.

Edit: The OP is looking for a macro that returns an expression of the unsigned type of the same length as the source type. The above macro doesn't do that: Because the two alternative values of the ternary comparison are promoted to a common type, the result of the macro will always be the type of the greatest size, which is unsigned long long.

Branches of different types could probably be achieved with a pure macro solution, such that after preprocessing, the compiler only sees one type, but the preprocessor doesn't know about types, so sizeof cannot be used here, which rules out such a macro.

But to my (weak) defense, I'll say that if the value of the unsigned long long result of the macro is assigned to the appropriate unsigned type (i.e. unsigned short for short), the value should never be truncated, so the macro might have some use.

Edit II: Now that I've stumbled upon the C11 _Generic keyword in another question (and have installed a compiler that supports it), I can present a working solution: The following macro really returns the correct value with the correct type:

#define TO_UNSIGNED(x) _Generic((x),           \
    char:        (unsigned char) (x),          \
    signed char: (unsigned char) (x),          \
    short:       (unsigned short) (x),         \
    int:         (unsigned int) (x),           \
    long:        (unsigned long) (x),          \
    long long:   (unsigned long long) (x),     \
    default:     (unsigned int) (x)            \
    )

The _Generic selection is resolved at compile time and doesn't have the overhead of producing intermediate results in an oversized int type. (A real-world macro should probably include the unsigned types themself for a null-cast. Also note that I had to include signed char explicitly, just char didn't work, even though my chars are signed.)

It requires a recent compiler that implements C11 or at least its _Generic keyword, which means this solution is not very portable, though, see here.

Community
  • 1
  • 1
M Oehm
  • 28,726
  • 3
  • 31
  • 42
  • 2
    What is the type of the return value of your TO_UNSIGNED macro? The return type is always the same (unsigned long long). But I was explicitly asking for a macro with return type depending on the type of the input. So this is not a solution to my question. – pts Jan 01 '14 at 16:00
  • Oh. It turns out that it can't be done after all. The compiler unifies the types as unsigned long longs. Still, a very nice piece of macro hackery. – Pitarou Jan 01 '14 at 16:13
  • @pts The return type depends on the "switch" case that is entered. For example, it is `unsigned short` for source types of `short`, `unsigned short` and `int16_t`. The cast to `unsigned long long` in the `printf` statements is only there so that all cases can be printed with the `%llx` specifier. (The cast is necessary, because the arguments to `printf` are treated as variadic, i.e. everything with a size up to the size of ´int` is promoted to `int` and the rest is passed with its own type.) – M Oehm Jan 01 '14 at 16:13
  • @Pitarou Yes, you're right. I've checked by printing the sizes of the result. I think that the two branches of the ternary comparison are promoted to the same type. – M Oehm Jan 01 '14 at 16:17
  • @MOehm Makes sense. The mathematics of type inference are rigorous, well-defined, and do not permit expressions with inconsistent types. – Pitarou Jan 01 '14 at 16:21
  • @MOehm, as I commented below - I completely agree that the main value here would be receiving the correct *value* of the 2's complement conversion according to type (At least until proven otherwise by the OP..), I still don't see how having it casted to `unsigned long long` could interfere with any practical usage. – Leeor Jan 01 '14 at 16:41
  • @Leeor: One practical usage is speed: if the input is an `int`, it's a waste of CPU time to do the computation as `unsigned long long`: `TO_UNSIGNED(a) + TO_UNSIGNED(b)`. – pts Jan 01 '14 at 19:43
  • @pts: Very well, but that looks like a contrived example. Where would you want to do such a computation and not know your exact type? Even if you have a generic `typedef`, which will probably be controlled via macros, why not have the appropriate unsigned typedef and then just assign? Or are you using that macro from another macro that emulates templates? – M Oehm Jan 01 '14 at 19:49
  • @MOehm: I know that I can write the code I want by explicitly specifying the type, and I know that I can use the corresponding unsigned typedef, or introduce such a typedef if it was not available. But I'd prefer a more elegant solution in which the compiler autodetects the output type based on the input type. In C++ I can solve it using a template...inline function. But now I'm looking for a C solution. – pts Jan 01 '14 at 20:00
  • 1
    @pts: Ah, elegance. If you have a compiler that supports C11, you could implement your macro with `_Generic`: No integer bloat, type evaluation at compile time; see my edited answer. Otherwise, there's just plain, old, rusty C, I'm afraid - and elegance be damned. ;-) – M Oehm Jan 01 '14 at 20:43
  • Thank you for mentioning _Generic, I didn't know about it. Clang 3.0 supports it. GCC 4.6 doesn't support it, but it can be emulated using __builtin_choose_expr (http://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Other-Builtins.html#index-g_t_005f_005fbuiltin_005fchoose_005fexpr-2391). – pts Jan 01 '14 at 21:40
  • FYI GCC 4.8 has `__SIZEOF_INT128__` defined on architectures which support `__int128_t`. This can be used with both _Generic and `__bultin_choose_expr`. For earlier versions of GCC (without `__SIZEOF_INT128__`) I think it's impossible to make it work on both systems: systems which support `__int128_t` and systems with don't. – pts Feb 23 '17 at 23:20
3

You don't need a macro. The conversion happens automatically. E.g.:

int x = -1;
unsigned int y;

y = x;

EDIT

You seem to want a macro that can infer the type of a variable from its name. That is impossible. Macros are run at a stage of compilation where the compiler doesn't have the type information available. So the macro must emit the same code regardless of the variable's type.

At the stage when type information becomes available, the compiler will insist that every expression has a consistent type. But you're asking for code that is inconsistently typed.

The best you can hope for is to supply the type information yourself. E.g.:

#define TO_UNSIGNED(type, name) (unsigned type(name))
glglgl
  • 89,107
  • 13
  • 149
  • 217
Pitarou
  • 2,211
  • 18
  • 25
  • 1
    I am explicitly asking for a macro, so this doesn't answer my question. The whole point of the macro is that it is smart enough to figure out `unsigned int` automatically. – pts Jan 01 '14 at 15:40
  • 1
    This requires explicitly passing the type – Leeor Jan 01 '14 at 15:56
  • 1
    Indeed it's easy to solve it with a 2-arg macro. I need a 1-arg macro, and it should be smart enough to figure out the type automatically. – pts Jan 01 '14 at 15:56
  • Your reasoning why it's impossible is flawed. The macro body can contain an expression which behaves differently based on the input types. For example, `#define ADD(a, b) ((a) + (b))` behaves differently: it can be an `int` addition, an `unsigned` addition, an `unsigned long` addition etc. No type information is needed at macro expansion time to make it work. – pts Jan 01 '14 at 19:52
  • Thank you for trying to help. Unfortunately I can't accept your answer, because it doesn't solve the problem, and it doesn't present a valid proof either that the problem is impossible to solve. – pts Jan 01 '14 at 19:53
2

Ok, since you intend to use this macro to implicitly convert negative values to their 2's complement counterparts, I think we can address it the following way:

#include "stdio.h"
#include "stdint.h"


#define TO_UNSIGNED(x) ( \
                          (sizeof(x) == 1 ? (uint8_t)x : \
                          (sizeof(x) <= 2 ? (uint16_t)x : \
                          (sizeof(x) <= 4 ? (uint32_t)x : \
                          (sizeof(x) <= 8 ? (uint64_t)x : \
                          x \
                        )))))



int main () {
    char a = -4;
    int b = -4;

    printf ("TO_UNSIGNED(a) = %u\n", TO_UNSIGNED(a));
    printf ("TO_UNSIGNED(b) = %u\n", TO_UNSIGNED(b));
    return 0;
}

Output:

TO_UNSIGNED(a) = 252
TO_UNSIGNED(b) = 4294967292

Of course support for further lengths may be required, I left the > 64bit to just return x itself for now.

Leeor
  • 19,260
  • 5
  • 56
  • 87
  • What is the type of the return value of your TO_UNSIGNED macro? The return type is always the same (`uint64_t`). But I was explicitly asking for a macro with return type depending on the type of the input. So this is not a solution to my question. – pts Jan 01 '14 at 15:58
  • This doesn't work for types longer than `long long`. – pts Jan 01 '14 at 15:59
  • @pts, it's a macro, the return type is exactly what the trenary operator returns, which is according to the size. And it's true about > long long, you need to choose how you implement such values – Leeor Jan 01 '14 at 15:59
  • No, the return type of the ternary operator is always the same. The C compiler devises a common return type by looking at the types of the two branches. – pts Jan 01 '14 at 16:01
  • If you take `sizeof(TO_UNSIGNED(...))`, do you get different values? – pts Jan 01 '14 at 16:01
  • 1
    @pts: Then please explain how you plan to use it, for most purposes the result would get casted anyway according to how you use this macro, as long as you get the 2's complement value right it shouldn't matter as far as I can see. – Leeor Jan 01 '14 at 16:08
  • Thank you for trying to help. Unfortunately I can't accept your answer, because it doesn't solve the problem: it doesn't convert the input to its corresponding unsigned type. – pts Jan 01 '14 at 19:56
0

It looks like there is no generic solution which supports integers of all possible sizes.

For a hardcoded list of types, I was able to make it work using __builtin_choose_expr in C and overloaded function in C++. Here is the solution: https://github.com/pts/to-unsigned/blob/master/to_unsigned.h

The relevant C code looks like this:

#define TO_UNSIGNED(x) ( \
    __builtin_choose_expr(__builtin_types_compatible_p(__typeof(x), unsigned char), (unsigned char)(x), \
    __builtin_choose_expr(__builtin_types_compatible_p(__typeof(x), char), (unsigned char)(x), \
    __builtin_choose_expr(sizeof(x) == sizeof(char), (unsigned char)(x), \
    __builtin_choose_expr(__builtin_types_compatible_p(__typeof(x), unsigned short), (unsigned short)(x), \
    __builtin_choose_expr(__builtin_types_compatible_p(__typeof(x), short), (unsigned short)(x), \
    __builtin_choose_expr(sizeof(x) == sizeof(short), (unsigned short)(x), \
    __builtin_choose_expr(__builtin_types_compatible_p(__typeof(x), unsigned), (unsigned)(x), \ 
    __builtin_choose_expr(__builtin_types_compatible_p(__typeof(x), int), (unsigned)(x), \
    __builtin_choose_expr(sizeof(x) == sizeof(int), (unsigned)(x), \
    __builtin_choose_expr(__builtin_types_compatible_p(__typeof(x), unsigned long), (unsigned long)(x), \
    __builtin_choose_expr(__builtin_types_compatible_p(__typeof(x), long), (unsigned long)(x), \
    __builtin_choose_expr(sizeof(x) == sizeof(long), (unsigned long)(x), \
    __extension__ __builtin_choose_expr(__builtin_types_compatible_p(__typeof(x), unsigned long long), (unsigned long long)(x), \
    __extension__ __builtin_choose_expr(__builtin_types_compatible_p(__typeof(x), long long), (unsigned long long)(x), \
    __extension__ __builtin_choose_expr(sizeof(x) == sizeof(long long), (unsigned long)(x), \
    (void)0)))))))))))))))) 

Instead of __builtin_choose_expr + __builtin_types_compatible_p, the equivalent _Generic construct can also be used with compilers that support it, starting from C11.

C++11 has std::make_unsigned, and its implementation in libstdc++ explicitly enumerates the integer types it knows about, similar to how my C++ implementation of TO_UNSIGNED does.

pts
  • 80,836
  • 20
  • 110
  • 183
  • All that seems pointless. If you have a known type that needs to be converted to a signed type, you just do it. Trying to write code that allows you to not know what data types you're working with seems like a huge red flag to me. "I don't know what this is, but I'm going to convert it to something else" has a [really bad code smell](https://en.wikipedia.org/wiki/Code_smell). – Andrew Henle Jan 17 '18 at 15:46
  • 1
    @AndrewHenle that's precisely what generic code is? I mean the *"Trying to write code that allows you to not know what data types you're working with"* part – Tamás Szelei Jan 17 '18 at 15:55