Memory optimization for `id`'s

Question

a = 1
b = 1
id(a) == id(b)    # True

Here is Python optimize memory consumption and reuse memory for two variables.

c = 1.45
d = 1.45
id(c) == id(d)   # False - Why ?

In the case of float Python doesn't use this optimization. Why?

There are way [too many floats](https://stackoverflow.com/questions/17949796/how-many-unique-values-are-there-between-0-and-1-of-a-standard-float). — Selcuk, Mar 03 '20 at 04:09
@HeapOverflow Not that many between 0 and 1. Jokes aside, Python only does that optimisation for integers between -5 and 256, not _all_ (?) of them. — Selcuk, Mar 03 '20 at 04:12
It also appears to do it for strings which I didn't know until now: a = 'asdf'; b = 'asdf'; id(a) == id(b) Out[44]: True` — n8yoder, Mar 03 '20 at 04:13
It's not that there are "too many floats", but rather that it there likely isn't an effective choice of *which* floats to cache, as there is with ints. Could you come up with a list of 262 exact float values that will consistently appear at a much higher frequency in most Python programs? Keep in mind caching has overhead, and the chosen values would have to be so frequent as to make up for this overhead on the creation of every float — Hymns For Disco, Mar 03 '20 at 04:18
@n8yoder only string literals in source code, and some other special cases, but all of these are things one shouldn't rely on (or care about) — juanpa.arrivillaga, Mar 03 '20 at 04:20
@HymnsForDisco Let's start with `3.141592653589793` and `2.718281828459045`. — Selcuk, Mar 03 '20 at 04:24
@Selcuk: That's what `math.e` and `math.pi` are for. But it wouldn't be all that useful to check every `float` literal in the program, and every computed floating point value, just to see if they could collapse into the same object as `math.e` or `math.pi` to save a little memory. — ShadowRanger, Mar 03 '20 at 04:34

ShadowRanger · Accepted Answer · 2020-03-03T04:33:20.097

3

CPython (the reference interpreter), as an implementation detail, has a small int cache for ints between -5 and 256; each value is intended to be unique (not always true in practice, but most of the time it's true; you shouldn't rely on it though).

This makes simple tasks like iterating a bytes object much cheaper (since all the values can be pulled from the cache), and saves some memory for commonly used small int values. It's not dynamically sized though, so creating 257 twice will get different ids (not always, but in many cases; there are other constant caching operations applied during compilation that can collapse such values used as literals in close proximity).

No such cache exists for floats, since there are a nigh infinite number of float values, and few are likely to see reuse across broad swathes of the program.

edited Mar 03 '20 at 04:33

answered Mar 03 '20 at 04:11

ShadowRanger

143,180
12
188
271

2

A bit of background I feel is missing: Python natively uses big ints. These have class-esque representation internally, and this is why a cache is meaningful to talk about. (If it were "just" int64 or the like, caching wouldn't make sense). – GManNickG Mar 03 '20 at 04:15
1

@GManNickG: Though that's giving the impression that something might not have a class-esque representation internally; literally everything in Python uses a "class-esque" representation internally (the few things that store in C-level types have to convert to Python level types when used individually, e.g. indexing `bytes` or `array.array` has to return an `int`). – ShadowRanger Mar 03 '20 at 04:17
@HeapOverflow: You can with extension modules that don't normalize their values and use private APIs to allocate raw `int`s and populate them manually. Aside from that, I'm pretty sure anything that lets you create an `int` with a value matching that from the small `int` cache that isn't pulled from the cache is essentially a mistake. – ShadowRanger Mar 03 '20 at 04:20
@ShadowRanger What do you mean with "mistake"? – Kelly Bundy Mar 03 '20 at 04:22
I mean for example `9**99 % 9**99 is not 0` – Kelly Bundy Mar 03 '20 at 04:24
1

@HeapOverflow: Basically every code path that ships with the interpreter should be going through an API that normalizes, such that any result that could come from the small `int` cache does (because the cache exists to avoid creating duplicate objects, and failing to use it wastes memory for no reason). Thus, if it doesn't fetch from the cache, it's a "mistake". And yeah, that result is a minor "mistake". It may have been decided it was worth not checking for certain cases even if it cost a little extra memory, but it is wasteful in that case. – ShadowRanger Mar 03 '20 at 04:26
@HeapOverflow But that's only because the result is a long. `int(9**99 % 9**99) is 0` should work fine. – Selcuk Mar 03 '20 at 04:27
@Selcuk That gives me `False`. – Kelly Bundy Mar 03 '20 at 04:29
@Selcuk: Nah, it works on Py3, where everything is a "long" (they got rid of the `int`/`long` dichotomy). `int()`-ifying it doesn't change anything; the mod path for large numbers just isn't normalizing the result. – ShadowRanger Mar 03 '20 at 04:29
@HeapOverflow Strange, it gives different results in Python 3 vs 2. – Selcuk Mar 03 '20 at 04:30
@Selcuk The bigger mystery is why are you still using Python 2? :-) – Kelly Bundy Mar 03 '20 at 04:30
@HeapOverflow Legacy codebase. But I tested in on Python 2 first mainly because I _thought_ that it's a Python 2 specific issue. – Selcuk Mar 03 '20 at 04:30
1

@Selcuk: That code on Python 2 involves an actual type conversion from `long` to `int`, which ends up normalizing in the process. On Python 3, it just says "it's already an `int`" and returns it unchanged. – ShadowRanger Mar 03 '20 at 04:31
@HeapOverflow: Amusingly, they used the `maybe_small_long` internal API for the large floor division path (so `9**99 // 9**99 is 1` is `True`); definitely seems like an oversight that the large remainder path (which is shared with large floor division) doesn't use it. – ShadowRanger Mar 03 '20 at 04:38

Memory optimization for `id`'s

1 Answers1