I made a binary search function and I'm curious what would happen if I used it on 4 billion numbers, but I get a MemoryError
every time I use it. Is there a way to store the list without this issue?
-
store it on disk in some file, don't load it all at once in memory, or install more RAM – Matiiss Jul 26 '22 at 18:57
-
1It seems like what would happen if you used 4 billion numbers on it is a `MemoryError` – Libra Jul 26 '22 at 18:58
-
5How much RAM does your computer have? `MemoryError` simply means your computer ran out of RAM. – Filip Müller Jul 26 '22 at 18:59
-
3What is the range of these numbers? If they're much smaller than the bit size of your computer would allow, you could save considerable space by storing them in something like an `array` (built-in module) or `numpy` (3rd-party module) array. – jasonharper Jul 26 '22 at 18:59
-
2What do you mean by *array*? – juanpa.arrivillaga Jul 26 '22 at 19:03
-
Anyway, on a modern machine with something like 64 gigabytes of RAM, you could easily store an array of 64bit numbers, which would cost about 32 gigs. – juanpa.arrivillaga Jul 26 '22 at 19:04
-
Hmm, *"binary **sort** function"*? What is that? – Kelly Bundy Jul 26 '22 at 19:11
-
Please provide enough code so others can better understand or reproduce the problem. – Community Jul 26 '22 at 19:24
1 Answers
Yes, but it's gonna be expensive.
Assuming the numbers are 1-4,000,000,000, each number would take up either 28 bytes or 32 bytes. Going with 32 bytes, storing 4,000,000,000 numbers would take ~128GB. The list itself also takes up memory, 8 bytes per object (source), so it would require around 192GB of memory altogether.
If you're wanting to emulate a list, however, things become a lot more reasonable.
If your numbers follow some formula, you can make a class that "pretends" to be a list by following the answers here. This will create a custom class that works like a list, (e.g. you can do foo[3]
on it and it will return a number) but doesn't take up all that memory because it isn't actually storing 4,000,000,000 numbers. In fact, this is pretty similar to how range()
works, and is the reason why doing range(1,4_000_000_000)
doesn't take up 192GB of ram.

- 136
- 2
- 13