How to convert uint to int in C with minimal loss of result range

Question

I want the difference between two unbounded integers, each represented by a uint32_t value which is the unbounded integer taken modulo 2^32. As in, for example, TCP sequence numbers. Note that the modulo 2^32 representation can wrap around 0, unlike more restricted questions that do not allow wrapping around 0.

Assume that the difference between the underlying unbounded integers are in the range of a normal int. I want this signed difference value. In other words, return a value within the normal int range that is equivalent to the difference of the two uint32_t inputs modulo 2^32.

For example, 0 - 0xffffffff = 1 because we assume that the underlying unbounded integers are in int range. Proof: if A mod 2^32 = 0 and B mod 2^32 = 0xffffffff, then (A=0, B=-1) (mod 2^32) and therefore (A-B=1) (mod 2^32) and in the int range this modulo class has the single representative 1.

I have used the following code:

static inline int sub_tcp_sn(uint32_t a, uint32_t b)
{
    uint32_t delta = a - b;

    // this would work on most systems
    return delta;

    // what is the language-safe way to do this?
}

This works on most systems because they use modulo-2^32 representations for both uint and int, and a normal modulo-2^32 subtraction is the only reasonable assembly code to generate here.

However, I believe that the C standard only defines the result of the above code if delta>=0. For example on this question one answer says:

If we assign an out-of-range value to an object of signed type, the result is undefined. The program might appear to work, it might crash, or it might produce garbage values.

How should a modulo-2^32 conversion from uint to int be done according to the C standard?

Note: I would prefer the answer code not to involve conditional expressions, unless you can prove it's required. (case analysis in the explanation of the code is OK).

Have you seen these posts: https://stackoverflow.com/questions/2633661/how-to-check-for-signed-integer-overflow-in-c-without-undefined-behaviour and https://stackoverflow.com/questions/199333/how-do-i-detect-unsigned-integer-multiply-overflow ? — Dai, Nov 05 '19 at 22:46
@Dai The question in the first link you posted asks about adding or subtracting signed integers, so how is it relevant? I am starting with unsigned integers. And your second link is asking about overflow on the unsigned integers, which doesn't exist in my case, by definition. — personal_cloud, Nov 05 '19 at 22:49
I mentioned it because it discusses how to detect integer overflow in general, which you're asking about. Other replies and comments in that thread also discuss unsigned integer "overflow" (in-quotes because unsigned integers don't overflow the same way that signed integers do). — Dai, Nov 05 '19 at 22:51
On a system that doesn't use 2's complement, there's no guarantee that an unsigned value will fit in a signed integer modulo 2^n. For example, a 1's complement `int8_t` can only hold values -127 to 127, so cannot hold 128 modulo 256. — interjay, Nov 05 '19 at 22:56
@interjay Yes, I understand that some range can be lost. That is why I updated the title of the question to say "with minimal loss of result range". So, in your example, I want code that only loses 1 of the 256 values. I will accept code that is UB in the case that is unrepresentable as `int`. Bonus if you can catch it with an assertion. — personal_cloud, Nov 05 '19 at 22:58
The text you quoted is wrong (out-of-range assignment is implementation-defined, not undefined). — M.M, Nov 05 '19 at 23:34
@M.M By "modulo-2^32 conversion", I mean that the input and output of the conversion should have the same value modulo 2^32. I've edited the first paragraph, to hopefully leave no doubts about the behavior I want in all cases. — personal_cloud, Nov 05 '19 at 23:36
`delta>=0` is always true because delta is an unsigned variable — M.M, Nov 05 '19 at 23:43
@M.M. Right, `delta>=0` always. The problem is when `delta > INT_MAX`... how do you do the conversion in a way that does not change the value modulo 2^32. — personal_cloud, Nov 05 '19 at 23:46
"Assume the difference is in the range of a normal int" is also unclear , as if `delta > INT_MAX` then it is not in range of an int . Can you give an example of what you had in mind for an out-of-range answer? — M.M, Nov 05 '19 at 23:47
@M.M. OK, to be pedantic, the underlying counters are unbounded integers, and these unbounded integers have a difference in the range of a normal `int` before they get reduced modulo 2^32. It should be clear enough from the fact that I am using TCP sequence numbers as an example. — personal_cloud, Nov 05 '19 at 23:51
Are you saying that you might call the function with arguments that are not `uint32_t` values? — M.M, Nov 05 '19 at 23:56
@M.M No, I didn't say that. I've edited the question; now I think it's blood-curdlingly precise what the inputs and outputs are. — personal_cloud, Nov 05 '19 at 23:58
Also can you return `int32_t` instead? Or are you intending to support systems with 16-bit int? — M.M, Nov 06 '19 at 00:02
Is [this answer](https://stackoverflow.com/a/56854468/1505939) suitable? — M.M, Nov 06 '19 at 00:10
@M.M I don't think the type pun given in that answer is correct. As interjay has noted, the system might not use 2's complement representation for signed integers. — personal_cloud, Nov 06 '19 at 00:40
@personal_cloud `int32_t` is guaranteed to use 2's complement representation (this is covered on that answer) — M.M, Nov 06 '19 at 00:56
@M.M The OP on that question replied to that answer, stating that there is no documentation to support that `int32_t` has to be two's complement, nor that a type pun is any more portable than a regular typecast. — personal_cloud, Nov 06 '19 at 02:05
@personal_cloud The OP comment was incorrect (the fact that OP could not see documentation does not mean that there is none). See the comment by user2162550 (Jul 10 2019) — M.M, Nov 06 '19 at 04:26
@M.M. Ah, yes! I see it now, he's got the 7.20.1.1 reference right there. Sorry I missed it before. So yes, you're right, I would accept a solution based on `int32_t` with some form of type pun, as it meets my stricter criteria at the end of my question (no conditionals). This should safely weed out (non-existent) one's complement systems, and therefore be safe. And yes, returning `int32_t` from `sub_tcp_sn` is a great idea too. Thank you. — personal_cloud, Nov 07 '19 at 00:12
@personal_cloud ok, would you agree with linking (duplicate) this question to the other one? It seems substantially the same, the range limit on the other question is a small detail — M.M, Nov 07 '19 at 01:32
@ M.M Sure, if you can remove the range limit from the other question, and include my examples about wrapping around 0, and basically rewrite the other question so that it is not full of examples that fail when wrapping around 0. Note that I have previously had difficulties broadening a question on SO. But maybe OK in this case, since I don't (presently) see any *answers* to that question that break around 0. — personal_cloud, Nov 07 '19 at 19:49

personal_cloud · Accepted Answer · 2019-11-08T06:08:57.663

0

There must be a standard function that does this... but in the meantime:

#include <stdint.h>  // uint32_t
#include <limits.h>  // INT_MAX
#include <assert.h>  // assert

static inline int sub_tcp_sn(uint32_t a, uint32_t b)
{
    uint32_t delta = a - b;
    return delta <= INT_MAX ? delta : -(int)~delta - 1;
}

Note that it is UB in the case that the result is not representable, but the question said that was OK.

If the system has a 64-bit long long type, then the range can easily be customized and checked as well:

typedef long long sint64_t;

static inline sint64_t sub_tcp_sn_custom_range(uint32_t a, uint32_t b,
                             sint64_t out_min, sint64_t out_max)
{
    assert(sizeof(sint64_t) == 8);
    uint32_t delta = a - b;
    sint64_t result = delta <= out_max ? delta : -(sint64_t)-delta;
    assert(result >= out_min && result <= out_max);
    return result;
}

For example, sub_tcp_sn_custom_range(0x10000000, 0, -0xf0000000LL, 0x0fffffffLL) == -0xf00000000.

With the range customization, this solution minimizes range loss in all situations, assuming timestamps behave linearly (for example, no special meaning to wrapping around 0) and a singed 64-bit type is available.

edited Nov 08 '19 at 06:08

answered Nov 05 '19 at 23:35

personal_cloud

3,943
3
28
38

1

This seems worse in every respect than `return delta;` – M.M Nov 05 '19 at 23:40
@M.M If delta > INT_MAX, then `return delta` is implementation dependent. The code in this answer works in all C implementations so long as the result is representable. – personal_cloud Nov 05 '19 at 23:41
where does the question say it's OK if the result is not representable? – M.M Nov 05 '19 at 23:48
@M.M Third sentence of the question: "Assume the difference is in the range of a normal int." – personal_cloud Nov 05 '19 at 23:48
This code causes undefined behaviour on normal systems for `sub_tcp_sn(2147483648, 4294967296)`. The difference between the two is INT_MIN which is in range of a normal int, presumably – M.M Nov 05 '19 at 23:58
@M.M Good catch. I guess it is impossible to directly convert either `delta` or `-delta` to `int` in that case... I've updated the answer to upcast to `int64_t` and check range first. – personal_cloud Nov 06 '19 at 00:06
1

I think you use `int` directly with `-(int)(-delta-1)-1` or `-(int)~delta-1`. – nwellnhof Nov 07 '19 at 20:51
@nwellnhof. Yes, I like that simplification. That way we don't need a 64-bit type unless we want to check that the result is representable. I have updated the answer. – personal_cloud Nov 08 '19 at 06:04

How to convert uint to int in C with minimal loss of result range

1 Answers1

Linked