Subtracting unsigned long longs with signed long long result?

Question

Suppose I have these two types:

typedef unsigned long long uint64;
typedef signed long long sint64;

And I have these variables:

uint64 a = ...;
uint64 b = ...;
sint64 c;

I want to subtract b from a and assign the result to c, clearly if the absolute value of the difference is greater than 2^63 than it will wrap (or be undefined) which is ok. But for cases where the absolute difference is less than 2^63 I want the result to be correct.

Of the following three ways:

c = a - b; // sign conversion warning ignored

c = sint64(a - b);

c = sint64(a) - sint64(b);

Which of the them are guaranteed to work by the standard? (and why/how?)

See also this related question: [How do promotion rules work when the signedness on either side of a binary operator differ?](http://stackoverflow.com/q/6770258/636019) — ildjarn, Dec 11 '12 at 07:40

Potatoswatter · Accepted Answer · 2012-12-12T01:34:14.490

None of the three work. The first fails if the difference is negative (no matter the absolute value), the second is the same as the first, and the third fails if either operand is too large.

It's impossible to implement without a branch.

c = b < a? a - b : - static_cast< sint64 >( b - a );

Fundamentally, unsigned types use modulo arithmetic without any kind of sign bit. They don't know they wrapped around, and the language spec doesn't identify wraparound with negative numbers. Also, assigning a value outside the range of a signed integral variable results in an implementation-defined, potentially nonsense result (integral overflow).

Consider a machine with no hardware to convert between native negative integers and two's complement. It can perform two's complement subtraction using bitwise negation and native two's complement addition, though. (Bizarre, maybe, but that is what C and C++ currently require.) The language leaves it up to the programmer, then, to convert the negative values. The only way to do that is to negate a positive value, which requires that the computed difference be positive. So…

The best solution is to avoid any attempt to represent a negative number as a large positive number in the first place.

EDIT: I forgot the cast before, which would have produced a large unsigned value, equivalently to the other solutions!

JasonD · Answer 2 · 2012-12-11T12:19:38.743

1

Potatoswatter's answer is probably the most pragmatic solution, but "impossible to implement without a branch" is like a red rag to a bull for me. If your hypothetical system implements undefined overflow/cast operations like that, my hypothetical system implements branches by killing puppies.

So I'm not completely familiar with what the standard(s) would say, but how about this:

sint64 c,d,r;
c = a >> 1;
d = b >> 1;
r = (c-d) * 2;
c = a & 1;
d = b & 1;
r += c - d;

I've written it in a fairly verbose fasion so the individual operations are clear, but have left some implicit casts. Is anything there undefined?

Steve Jessop rightly points out that this does fail in the case where the difference is exactly 2^63-1, as the multiply overflows before the 1 is subtracted.

So here's an even uglier version which should cover all underflow/overflow conditions:

sint64 c,d,r,ov;
c = a >> 1;
d = b >> 1;
ov = a >> 63;
r = (c-d-ov) * 2;
c = a & 1;
d = b & 1;
r += ov + ov + c - d;

edited Dec 11 '12 at 12:19

answered Dec 11 '12 at 07:17

JasonD

16,464
2
29
44

I am more familiar with C than C++, but in C, the left shit of a negative number is undefined (C99-C11) or arguably implementation-defined (C90). Left-shifting a signed number so large that the shifted value exceeds the range of the type is also undefined (this, your code avoids). – Pascal Cuoq Dec 11 '12 at 07:23
Yes, I was thinking that it might be undefined to shift into the sign-bit, but I couldn't find any clear reference to it. – JasonD Dec 11 '12 at 07:29
You can always change `<< 1` into `* 2` if the former is not always defined in C++. – Pascal Cuoq Dec 11 '12 at 07:29
@JasonD: the clear reference is 5.8, `[expr.shift]`: "if E1 has a signed type and non-negative value, and E1×2^E2 is representable in the result type, then that is the resulting value; otherwise, the behavior is undeﬁned." – Steve Jessop Dec 11 '12 at 09:56
Also, the multiplication overflows (and hence has UB) in the case where `a` is 2^63 and `b` is 1. Strictly speaking that's one of the cases where the questioner wants the result to be correct, since the difference is less than 2^63. But it's right on the limit. I think you can fix it by subtracting an extra 1 before the multiplication, and adding `2 + c - d` at the end. Then you get UB only in cases the questioner doesn't care about. I think. – Steve Jessop Dec 11 '12 at 10:02
Noting that if `a` is 0 and `b` is 2^63, then subtracting the extra 1 causes an overflow. But arguably that doesn't matter because the difference is not *less than* 2^63, albeit the correct result *is* representable in a 2's complement 64 bit signed integer. – Steve Jessop Dec 11 '12 at 10:31
We could do additional bit-twiddling to only subtract 1 / add 2 in the case where the top bit is set, avoiding the overflow and underflow. – JasonD Dec 11 '12 at 10:38
@JasonD A question about the language is just that, don't get worked up. I didn't try compiling and disassembling the expression I suggested, but it's possible a good compiler would notice that the two paths are identical and replace them with just one subtract operation. And a poor x86 compiler would use `cmove` not a true branch. But a half-dozen or more operations is definitely a loss of performance, and including any sort of bitwise operation on a signed number is definitely a loss of portability. – Potatoswatter Dec 11 '12 at 14:47
@Potatoswatter I'm not worked up :) I don't think anyone will mistake my answer as a better, or even sane, alternative to yours. However while my answer is clearly inefficient, I'm not doing any bitwise operations on signed numbers. All the bit operations are on the unsigned values, and the calculation is done using signed. – JasonD Dec 11 '12 at 15:17

Pascal Cuoq · Answer 3 · 2012-12-11T06:49:13.917

0

if the absolute value of the difference is greater than 2^63 than it will wrap (or be undefined) which is ok. But for cases where the absolute difference is less than 2^63 I want the result to be correct.

Then all three of the notations you suggest work, assuming a conventional architecture. The notable difference is that the third one sint64(a) - sint64(b) invokes undefined behavior when the difference is not representable, whereas the first two are guaranteed to wrap around (unsigned arithmetic overflow is guaranteed to wrap around and conversion from unsigned to signed is implementation-defined, whereas signed arithmetic overflow is undefined).

edited Dec 11 '12 at 06:49

answered Dec 11 '12 at 06:24

Pascal Cuoq

79,187
7
161
281

None of the three work. The first fails if the difference is negative (no matter the absolute value), the second is the same as the first, and the third fails if either operand is too large. – Potatoswatter Dec 11 '12 at 06:29
@Potatoswatter We must have understood the question differently. I take the question to be about computing the difference between `a` and `b`, not the absolute value of the difference. If this is the goal, then `sint64(a - b)` works even if the unsigned subtraction `a - b` wraps around. – Pascal Cuoq Dec 11 '12 at 06:31
That is the question. Given that the absolute value can be represented by a signed type (including negative values), the goal is to reliably compute it. – Potatoswatter Dec 11 '12 at 06:33
Wrapping around results in a number that cannot be represented by the signed type, and the integral typecast then causes undefined behavior. – Potatoswatter Dec 11 '12 at 06:34
“the integral typecast then causes undefined behavior” I don't think so. – Pascal Cuoq Dec 11 '12 at 06:35
“Wrapping around results in a number that cannot be represented by the signed type” Not necessarily. There is wrap-around for `a==2` and `b==3`, but the result is a number that implementation-definedly converts to `-1`. – Pascal Cuoq Dec 11 '12 at 06:38
Did you just define the implementation? ;v) — but you're partly right; the result is implementation-defined (meaning useless, in the context of this question as it's phrased). Undefined behavior does occur for floating-integral conversions that overflow the integral type, though. – Potatoswatter Dec 11 '12 at 06:38
@Potatoswatter “implementation-defined” is a technical term with specific meaning. I use it to make it clear that results might not hold for exotic architectures, though they do for the OP's computer with excellent probability. It is **not** the same as “undefined”, and by the way your statement that converting unsigned to signed is undefined is wrong. – Pascal Cuoq Dec 11 '12 at 06:42
1

The question is "Which of the them are guaranteed to work by the standard? (and why/how?)", it's not a practical issue. – Potatoswatter Dec 11 '12 at 06:44

Subtracting unsigned long longs with signed long long result?

3 Answers3