12

Consider the example:

>>> from sys import maxint
>>> type(maxint)
<type 'int'>
>>> print maxint
9223372036854775807
>>> type(maxint+2)
<type 'long'>
>>> print maxint+2
9223372036854775809
>>> type((maxint+2)+maxint)
<type 'long'>
>>> print ((maxint+2)+maxint)
18446744073709551616

Python will autopromote from an int, which in this case is a 64 bit integer value (OS X, python 2.6.1) to a python long integer which is of arbitrary precision. Even though the types are not the same, they are similar and Python allows the usual numeric operators to be used. Usually this is helpful, for example to be able to use code that expects 64-bit values on a 32-bit machine.

However, the arbitrary precision operations are vastly slower than the native int operations. For example:

>>> print maxint**maxint # execution so long it is essentially a crash

Is there a way to defeat or disallow the auto-promotion of a Python int to a Python long?

Karl Knechtel
  • 62,466
  • 11
  • 102
  • 153
dawg
  • 98,345
  • 23
  • 131
  • 206
  • 1
    maxint**maxint is a number with >>750 decimals, i hope you're not really suprised it takes a while. Also what is supposed to happen when a number does not fit into 32 bit? – Jochen Ritzel Dec 06 '10 at 01:29
  • 1
    You're saying basic mathematically operations is making your application run hours longer it otherwise would? That sounds like your error, not python's – Falmarri Dec 06 '10 at 01:38
  • 1
    Also, what should happen instead of autopromotion? Segfault? Sounds like you should keep your numbers below sys.maxint... – Falmarri Dec 06 '10 at 01:39
  • 2
    Raising OverflowError is perfectly reasonable alternative behavior to converting to a long, though there's no mechanism in the language to do this (eg. there's probably no sane to do it for just your code and not libraries, which means it'd break things). – Glenn Maynard Dec 06 '10 at 02:43
  • @Glenn Maynard: That is really the answer I was looking for. Is there a 'pythonic' way that me as a newby could not see or Google that change the default behavior of an int. Numpy is probably closest, but lots of code to change to get there. – dawg Dec 06 '10 at 07:13
  • @THC4k: For `maxint**maxint` (a reasonable typo btw if you meant to say `int_a * int_b` vs typo of `int_a ** int_b`) there are two failure modes: infinite execution or `inf` as a truncated answer. I think `inf` is probably better IMHO. I think some sort of detectable signal would be a good thing. – dawg Dec 06 '10 at 07:21
  • @Falmarri: A 32 bit int or 64 bit int would roll over as it would in C or ASM. In Perl it would roll over to a float then to `inf` in a reasonable time. I guess choose your poison: infinite execution or truncated answer!! ;-)) – dawg Dec 06 '10 at 07:21
  • I guess I am also looking for consistency. If `c+d` fails with 1 as an int and the other a string, it should at least be detectable if it is promoted to a long from an int... – dawg Dec 06 '10 at 07:28
  • @drewk: You're thinking of this in C too much. Why should numbers be limited to 32 or 64 bits? They're not kept in registers. There is arbitrary precision math in python. Why shouldn't it just keep being promoted until it finds a solution? ie the `Decimal` module. – Falmarri Dec 06 '10 at 07:42
  • Why on earth would you want to limit it? Reasonably you want an answer. If you don't want the answer, don't do the calculation. :) – Lennart Regebro Dec 06 '10 at 07:46
  • @Falmarri: Could be I do think too much in C..... But it does seem like a fundamental question: should a failure mode be a really really long execution time (like days for `big_num ** big_num` in Python) or a detectable signal (C or C++ you can detect overflow...) or a truncated answer (`big_num ** big_num` == `inf` in Perl, PHP, Ruby...) – dawg Dec 06 '10 at 07:47
  • @Lennart Regebro: Two reasons: 1) if you type `big_num ** big_num` vs `big_num * big_num` your reward for that typo is infinite execution rather than an overflow signal or `inf` as a result. 2) I am curious!!! I want to know if I can do the default another way. Part of how I personally learn how things work. – dawg Dec 06 '10 at 07:55
  • 1) So don't do that. :) 2) OK, fair enough, curious is a good reason. – Lennart Regebro Dec 06 '10 at 10:11
  • I retagged this question because the behaviour is 2.x-specific - 3.x does not have separate `int` and `long` types. – Karl Knechtel Jul 27 '22 at 02:16

6 Answers6

5

If you want arithmetic overflows to overflow within e.g. 32 bits, you could use e.g. numpy.uint32.

That gives you a warning when an overflow occurs.

>>> import numpy
>>> numpy.uint32(2**32-3) + numpy.uint32(5)
Warning: overflow encountered in ulong_scalars
2

I tested its speed though:

>\python26\python.exe -m timeit "2**16 + 2**2"
1000000 loops, best of 3: 0.118 usec per loop

>\python26\python.exe -m timeit "2**67 + 2**65"
1000000 loops, best of 3: 0.234 usec per loop

>\python26\python.exe -m timeit -s "import numpy; numpy.seterr('ignore')" "numpy.uint32(2)**numpy.uint32(67) + numpy.uint32(2)**numpy.uint32(65)"
10000 loops, best of 3: 34.7 usec per loop

It's not looking good for speed.

Craig McQueen
  • 41,871
  • 30
  • 130
  • 181
  • 2
    it will be much slower than usual integer arithmetic, as numpy's overhead for per/item handling is quite significant – David Cournapeau Dec 06 '10 at 10:27
  • 3
    You called the constructor four times in each loop, which is going to be madly expensive. Instead, you should cache the `uint32` objects themselves. – nneonneo Oct 13 '12 at 15:41
5

So you want to throw out the One True Way and go retro on overflows. Silly you.

There is no good upside to the C / C++ / C# / Java style of overflow. It does not reliably raise an error condition. For C and C99 it is "undefined behavior" in ANSI and POSIX (C++ mandates modulo return) and it is a known security risk. Why do you want this?

The Python method of seamlessly overflowing to a long is the better way. I believe this is the same behavior being adapted by Perl 6.

You can use the Decimal module to get more finite overflows:

>>> from decimal import *
>>> from sys import maxint
>>> getcontext()
Context(prec=28, rounding=ROUND_HALF_EVEN, Emin=-999999999, Emax=999999999, capitals=1,
flags=[], traps=[DivisionByZero, Overflow, InvalidOperation])

>>> d=Decimal(maxint)
>>> d
Decimal('9223372036854775807')
>>> e=Decimal(maxint)
>>> f=d**e
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/decimal.py", line 2225, in __pow__
    ans = ans._fix(context)
  File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/decimal.py", line 1589, in _fix
    return context._raise_error(Overflow, 'above Emax', self._sign)
  File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/decimal.py", line 3680, in _raise_error
    raise error(explanation)
decimal.Overflow: above Emax

You can set your precision and boundary conditions with Decimal classes and the overflow is nearly immediate. You can set what you trap for. You can set your max and min. Really -- How does it get better than this? (I don't know about relative speed to be honest, but I suspect it is faster than numby but slower than native ints obviously...)

For your specific issue of image processing, this sounds like a natural application to consider some form of saturation arithmetic. You also might consider, if you are having overflows on 32 arithmetic, check operands along the way on obvious cases: pow, **, *. You might consider overloaded operators and check for the conditions you don't want.

If Decimal, saturation, or overloaded operators don't work -- you can write an extension. Heaven help you if you want to throw out the Python way of overflow to go retro...

Community
  • 1
  • 1
the wolf
  • 34,510
  • 13
  • 53
  • 71
2

You can force your values to return to normal ints if you include an num = int(num) occasionally in your algorithm. If the value is long but fits in a native int, it will demote down to int. If the value doesn't fit in a native int, it will remain a long.

Ned Batchelder
  • 364,293
  • 75
  • 561
  • 662
1

I don't know if it would be faster, neccesarily, but you could use numpy arrays of one element instead of ints.

If the specific calculation you are concerned about is integer exponentiation, then there are some inferences we can draw:

def smart_pow(mantissa, exponent, limit=int(math.ceil(math.log(sys.maxint)/math.log(2)))):
    if mantissa in (0, 1):
        return mantissa
    if exponent > limit:
        if mantissa == -1: 
            return -1 if exponent&1 else 1
        if mantissa > 1:
            return sys.maxint
        else: 
            return (-1-sys.maxint) if exponent&1 else sys.maxint
    else: # this *might* overflow, but at least it won't take long
        return mantissa ** exponent
SingleNegationElimination
  • 151,563
  • 33
  • 264
  • 304
  • 1
    it will much much slower - numpy's speed is coming from handling a lot of items from the same type altogether. The overhead per item is quite significant, especially compared to one int. Also, numpy won't necessarily warn you if you overflow your integers. – David Cournapeau Dec 06 '10 at 03:58
  • 1
    I don't know of too many ways to reliably or portably getting overflow information *after the fact* even in C. If you want this, you probably need to write code that explicitly checks for calculations that will overflow such as `if (MAX_INT - b) < a`, or access overflow status flags in assembly language. – SingleNegationElimination Dec 06 '10 at 05:00
  • Please see [Integer Security](http://ptgmedia.pearsoncmg.com/images/0321335724/samplechapter/seacord_ch05.pdf) for numerous ways to this in C. – dawg Dec 06 '10 at 07:34
  • I must be missing something, I don't see any way in that paper that shows how to portably detect integer overflows after they have occured. There are many ways (similar to the examples I've provided) it shows how to detect that an overflow will occur, and it also shows some x86 assembly that provides access to the hardware condition registers that do detect the error. Thank you for providing support for my answer. – SingleNegationElimination Dec 06 '10 at 08:27
  • @@TokenMacGuy: I think you thought I was disagreeing with you. I was agreeing with the posted link. Rereading my comment, I guess that was not too clear! ;-) – dawg Dec 06 '10 at 15:32
1

Well, if you don't care about accuracy you could all of your math ops modulo maxint.

Craig McQueen
  • 41,871
  • 30
  • 130
  • 181
Tyler Eaves
  • 12,879
  • 1
  • 32
  • 39
1

Int vs long is an historical legacy - in python 3, every int is a "long". If your script speed is limited by int computation, it is likely that you are doing it wrong.

To give you a proper answer, we need more information on what are you trying to do.

David Cournapeau
  • 78,318
  • 8
  • 63
  • 70
  • It is hard to say "exactly" what I am doing because about 75% is code cut and paste. I am mostly a Perl guy and learning Python as I go. I do know enough python to see WHY it is randomly slowing down; these are 32 bit image signatures and 99.99% are with 2^32. 0.01% are unbelievable slow which I have traced to a 32 bit overflow from an image signature. My first inclination was (Surprise!!!) to rewrite the offending code in C or Perl, but I thought I would give this notion a try.... – dawg Dec 06 '10 at 06:59
  • @drewk: Python implements its `int` using a C `signed int` I believe, which means in your Python code (on a 32-bit platform at least), anything `2**31` or higher becomes a Python `long`. Does that explain what you're seeing? – Craig McQueen Dec 06 '10 at 07:39
  • How have you traced this to 32 bit overflow? Upgrading an int to a long should take an unperceivable amount of time. – Falmarri Dec 06 '10 at 07:44
  • @Craig McQueen: I understand *why* I am seeing this. It is the result of an overflow in 32 bit Python. It does not happen in my 64 bit python. I can fix this particular problem: run this script in 64 bit and *fugetaboutit*. The issue is more of curiosity and future knowledge. In C and Perl I would know how to detect the issue. In Python I do not know how to detect it, control it or change it. I think I can do it with numpy but that seems less than optimal. – dawg Dec 06 '10 at 08:10
  • @Falmarri: try `$ python -m timeit '2**16 + 2**2'` vs `$ python -m timeit '2**67 + 2**65'`. On my machine that is a 10x difference in speed. – dawg Dec 06 '10 at 08:24
  • @drewk: then something that is small but still shows a similar issue ? – David Cournapeau Dec 06 '10 at 10:28
  • @drewk - *My first inclination was rewrite in C* - so do it ;) I'm not being snarky, this really is the de-facto recommendation from Python experts (one of which I don't claim to be, mind) when performance really matters, and is worth compromising flexibility for. Of course, I realise that this question is also just for the sake of learning something about Python, but writing a hybrid Python/C program is still something to learn... – detly Dec 06 '10 at 14:06
  • @drewk: But what are you suggesting should happen instead? You're complaining that integer upgrading is taking a long time. But python either does that, or it crashes. So you should write your code such that you don't cause integer overflows, or you should thank python that it still works through your mistakes – Falmarri Dec 06 '10 at 18:29
  • Also, what is that timeit test supposed to prove?: python -m timeit '2**16 + 2**2' 10000000 loops, best of 3: 0.0361 usec per loop python -m timeit '2**67 + 2**65' 10000000 loops, best of 3: 0.0741 usec per loop – Falmarri Dec 06 '10 at 18:32
  • You're basing judgement off of hundredths of microseconds? You need to profile your ENTIRE program. Show us which function is taking 10x longer, and then show us in that function that the only thing being done is integer multiplication – Falmarri Dec 06 '10 at 18:33
  • @Falmarri: I am not "blaming" python for anything or "complaining" at all. I am very sorry if anything I said came across that I was "complaining" about it. I am not sure how that came about? :-)) This is mainly curiosity. I cannot post the relevant code because it is copyrighted. The entire programs (which is not my code) takes 3 min / image if there is no int promotion and 45 min / image if there is a promotion to a long. In 64 bit mode there are other issues but there is no slow-down. Since there are 10k images to process, that is a long time. – dawg Dec 06 '10 at 21:05
  • @Falmarri: RE *But what are you suggesting should happen instead?* I answered that to you in another comment: *A 32 bit int or 64 bit int would roll over as it would in C or ASM. In Perl it would roll over to a float then to inf in a reasonable time. I guess choose your poison: infinite execution or truncated answer!!* ;-)) – dawg Dec 06 '10 at 21:21
  • 1
    @Falmarri: `More is good... all is better.` I don't think it is a matter of either / or. I just wanted a failure mode that happened immediately rather than a failure mode that never happens. Both modes are failures in the case of a large overflow. It would be nice to be able to set or choose the behavior desired. The `Decimal` module has it (but it is kinda slow...`) As a default, Pythons behavior for ints / longs is fine and logical. As the only possible way -- I prefer choices. – dawg Dec 08 '10 at 19:51