Explaining why -x -1
is correct in general (for integers)
Sometimes (example), people are surprised by the mathematical behaviour of the ~ operator. They might reason, for example, that rather than evaluating to -19
, the result of ~18
should be 13
(since bin(18)
gives '0b10010'
, inverting the bits would give '0b01101' which represents 13
- right?). Or perhaps they might expect 237
(treating the input as signed 8-bit quantity), or some other positive value corresponding to larger integer sizes (such as the machine word size).
Note, here, that the signed interpretation of the bits 11101101
(which, treated as unsigned, give 237
) is... -19
. The same will happen for larger numbers of bits. In fact, as long as we use at least 6 bits, and treating the result as signed, we get the same answer: -19.
The mathematical rule - negate, and then subtract one - holds for all inputs, as long as we use enough bits, and treat the result as signed.
And, this being Python, conceptually numbers use an arbitrary number of bits. The implementation will allocate more space automatically, according to what is necessary to represent the number. (For example, if the value would "fit" in one machine word, then only one is used; the data type abstracts the process of sign-extending the number out to infinity.) It also does not have any separate unsigned-integer type; integers simply are signed in Python. (After all, since we aren't in control of the amount of memory used anyway, what's the point in denying access to negative values?)
This breaks intuition for a lot of people coming from a C environment, in which it's arguably best practice to use only unsigned types for bit manipulation and then apply 2s-complement interpretation later (and only if appropriate; if a value is being treated as a group of "flags", then a signed interpretation is unlikely to make sense). Python's implementation of ~
, however, is consistent with its other design choices.
How to force unsigned behaviour
If we wanted to get 13
, 237
or anything else like that from inverting the bits of 18
, we would need some external mechanism to specify how many bits to invert. (Again, 18
conceptually has arbitrarily many leading 0s in its binary representation in an arbitrary number of bits; inverting them would result in something with leading 1s; and interpreting that in 2s complement would give a negative result.)
The simplest approach is to simply mask off those arbitrarily-many bits. To get 13
from inverting 18
, we want 5 bits, so we mask with 0b11111
, i.e., 31. More generally (and giving the same interface for the original behaviour):
def invert(value, bits=None):
result = ~value
return result if bits is None else (result & ((1 << bits) - 1))
Another way, per Andrew Jenkins' answer at the linked example question, is to XOR directly with the mask. Interestingly enough, we can use XOR to handle the default, arbitrary-precision case. We simply use an arbitrary-sized mask, i.e. an integer that conceptually has an arbitrary number of 1
bits in its binary representation - i.e., -1
. Thus:
def invert(value, bits=None):
return value ^ (-1 if bits is None else ((1 << bits) - 1))
However, using XOR like this will give strange results for a negative value
- because all those arbitrarily-many set bits "before" (in more-significant positions) the XOR mask weren't cleared:
>>> invert(-19, 5) # notice the result is equal to 18 - 32
-14