2

From Python Numeric Types:

Integers have unlimited precision.

This test

#include <stdio.h>

int main(void) {
    printf("%zu bytes", sizeof(long long));
    return 0;
}

gives me 8 bytes or 64 bits under Linux.

How this is implemented in cpython (this was answered in the comment section)?

What happens when the integer exceeds long long in the implementation?

How big is the speed difference in arithmetics between pre-8-byte and post-8-byte integers?

  • Possible duplicate of [How does Python manage int and long?](http://stackoverflow.com/questions/2104884/how-does-python-manage-int-and-long) – dhke Aug 24 '16 at 14:58
  • @dhke it is, partially. But I wonder how oversized integers are stored, it should make the difference. –  Aug 24 '16 at 14:59
  • `%zd` is a wrong conversion specifier. `size_t` is unsigned! – too honest for this site Aug 24 '16 at 15:02
  • @light2yellow: [PEP 237](https://www.python.org/dev/peps/pep-0237/) has the details for CPython 2. In CPython 3, the union is replaced by a [struct with an extensible array](https://github.com/python/cpython/blob/0e7736c2e955724df2da09060e7aea9864050f80/Include/longintrepr.h#L89). As for performance, see e.g. [here](https://www.southampton.ac.uk/~fangohr/blog/performance-of-pythons-long-data-type.html). – dhke Aug 24 '16 at 15:04
  • 1
    How does an 8 bit CPU calculate 16 bit `int`? How 32 bit `int`? What did you find out yourself? What **specifically don't you understand? – too honest for this site Aug 24 '16 at 15:04
  • It's stored in base `2**15` or `2**30`, depending on the system and build; each "digit" is stored in a suitable unsigned C type (e.g., `unsigned short` for base `2**15`, `unsigned int` for base `2**30`, on a typical system). See [longintrepr.h](https://github.com/python/cpython/blob/master/Include/longintrepr.h) in the Python source for details of the storage format, and [longobject.c](https://github.com/python/cpython/blob/master/Objects/longobject.c) for the implementation of the arithmetic. – Mark Dickinson Aug 24 '16 at 15:05
  • @Olaf good point! –  Aug 24 '16 at 15:07
  • @MarkDickinson: 15 or 30 bits?? Very uncommon! – too honest for this site Aug 24 '16 at 15:08
  • @Olaf: Not that uncommon. It allows a bit of wiggle-room in the arithmetic operations, which is useful, especially if you don't want to rely on details of the C implementation or start coding in assembly language (as in GMP). For example, efficiently extracting the carry bit from a base 2**32 addition is tricky in standard portable C. – Mark Dickinson Aug 24 '16 at 15:10
  • @Olaf [`longintrepr.h`](https://github.com/python/cpython/blob/0e7736c2e955724df2da09060e7aea9864050f80/Include/longintrepr.h#L11) has quite some detail on why things are this way. In particular, you can always calculate the sum of two digits without overflow, which is quite handy. – dhke Aug 24 '16 at 15:11
  • @MarkDickinson thanks for the links. Is it okay that they link on diffs? –  Aug 24 '16 at 15:11
  • 1
    @MarkDickinson: Arithmetics on unsigned integers is not implementation-specific! You typically use the machine's word-size, which is 16 resp 32 bits (32 on architectures Python supports). But I think about an optimized Assembler-version. Don't see much good implementation such critical code in C actually. The core would be pretty small and for e.g. multiplication, a single spare bit doesn't change much. – too honest for this site Aug 24 '16 at 15:12
  • @light2yellow: Sorry, copy and paste errors on the links. They should be fixed now. – Mark Dickinson Aug 24 '16 at 15:12
  • @Olaf: See my comment about carry bits. – Mark Dickinson Aug 24 '16 at 15:13
  • @MarkDickinson: see my edit – too honest for this site Aug 24 '16 at 15:14
  • @MarkDickinson thank you again. Would you be so kind to write an answer and say a few words for me to check it as the right one? –  Aug 24 '16 at 15:15
  • The only question in the post is "why is it still on hold?" For clarity, suggest removing that and clearly asking your 2 questions. I'm voting to re-open anyway given the _implied_ questions. – chux - Reinstate Monica Aug 24 '16 at 19:35
  • @MarkDickinson why difficult? something like `sum = a + b; carry = sum < a;` with unsigned operands. bigint limbs should always be unsigned except for the most significant one – phuclv Aug 25 '16 at 06:21
  • @LưuVĩnhPhúc: Right, but the basic operation you need for multi-limb addition is not `a + b -> sum and carry`, but `a + b + carry -> sum and carry`, and now your overflow condition needs to be different depending on whether there was a carry going in or not, and you end up with a corresponding branch in the C code... It's just slightly more straightforward to deal with these issues with the extra bit in hand. @Olaf: no, an optimised assembler version would be a terrible idea for Python. The priority is having simple, *portable* code. This is not GMP; the goals are completely different. – Mark Dickinson Aug 25 '16 at 08:33

1 Answers1

-1

What if I have a bigger number in Python (not going into 8 bytes)?

Python will adapt and always store the correct number, no approximation. Even with very big INTEGER (but this is not true with other types)

How is it stored in the system, literally? Like a few smaller integers?

That's python implementation. You could find it from the source code here : svn.python.org/projects/python/trunk/Objects/longobject.c (thanks @samgak)

Does it hugely slow down the arithmetics on this number?

Yes like in other languages when the number becomes bigger than .e.g 2^32 on 32 bits systems the arithmetic becomes slower. How much is implementation dependent.

What does Python do, when it encounters such number?

Huge integers are stored in a different way and all arithmetic is adapted to fit.

Is there any difference in Python's 2 & 3 behavior except storing as long in Python 2 and appending L to string representation?

Python 2 and 3 should have the same high level behaviour.

Julien
  • 1,810
  • 1
  • 16
  • 34
  • > Thats python implementation, you cannot know how it is physically stored. What about Linux? I've edited the question. –  Aug 24 '16 at 15:01
  • Why cannot I know how this is implemented? It **is** implemented somehow in C and I wonder how precisely. –  Aug 24 '16 at 15:03
  • @light2yellow I do not (but other persons may) know how it is implemented. You will have to look into the source code of python to know how this is done. – Julien Aug 24 '16 at 15:07
  • Here's the source: https://svn.python.org/projects/python/trunk/Objects/longobject.c – samgak Aug 24 '16 at 15:10
  • 4
    Well, I don't want to insult you in any way, but everything you've written is just a general information known to me without this answer. So, if you `don't know` please don't answer with `I don't know`. Because this is not an answer. –  Aug 24 '16 at 15:21
  • @light2yellow Yeah this is the greatest non-answer answer I've seen on SO in a while. – Dan Fego Aug 24 '16 at 15:23
  • "Python 2 and 3 should have the same high level behaviour." - Even that is wrong. The rest wrong or "don't know". I agree with @light2yellow, this is not an answer. – too honest for this site Aug 24 '16 at 15:25
  • @olaf what is wrong about "Python 2 and 3 should have the same high level behaviour" ? Please clarify, we may learn something. – Julien Aug 25 '16 at 06:15
  • @DanFego: Yet it got one upvote (I always wonder how one upvotes - or downvotes - posts he apparently does not understand; some reflex?), so nets it pays for the poster. Hope others will change that. – too honest for this site Aug 25 '16 at 11:18
  • @Olaf: I upvoted because this answer does answer what was being asked. How and when Python reverts to using BigInteger code is dependent on the platform and the Python implementation. It does not give exact specifics, but explaining how BigIntegers work exactly would thoroughly explode the size of any answer. – Rudy Velthuis Aug 25 '16 at 20:57
  • @Olaf: the OP may not like the answer, but he can't expect to get more specifics, IMO. And contrary to the reviewer, I do think this answers the question. It does not request clarification, it gives clarification. And it does not critique either. – Rudy Velthuis Aug 25 '16 at 21:02
  • @RudyVelthuis: "but explaining how BigIntegers work exactly would thoroughly explode the size of any answer." - That's exactly the reason the question has been closed as too broad. The only part which looks like an answer is that integers grow with the size. Which is not quite exactly true for Python2, though. (Btw. I don't care of OP likes the answer, that is his own decision. I vote/comment according to my opinion.) – too honest for this site Aug 25 '16 at 21:07
  • I do think the answer answers what is being asked. That may not be what the OP wanted, but then the OP should ask what he really wants to know. He did ask these questions. If he knew it already, he should have said so in his question. Unlike the reviewer, I do think this is a legitimate answer, and certainly not a comment. It is also not a non-answer answer. – Rudy Velthuis Aug 25 '16 at 21:13
  • @Olaf: What is not right about Python 2? AFAIK, in Python 2, long integers also have "unlimited" precision, so they obviously also grow with the size. – Rudy Velthuis Aug 25 '16 at 21:24
  • @RudyVelthuis: YOu already mentioned the difference: They are a different type! (Although the transition is **mostly** seamless) – too honest for this site Aug 25 '16 at 21:30
  • @Olaf: I must be missing something, but I'll find out on my own, thanks anyway. Is this the difference between long integers and integers? – Rudy Velthuis Aug 25 '16 at 21:32