2

I am trying to work with a large file ~ roughly 50 GB. I am trying to access iterate through the file using numpy memory mapping. I see that there is a limitation on the size of the file to be used for memory mapping which is 2GB for 32 bit systems. Here is the link : https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.memmap.html

I would like to know if there is a hard limit on file size using numpy memory mapping for a good performance.

Delta
  • 161
  • 1
  • 8
  • It that related to [this question](https://stackoverflow.com/questions/726471/how-big-can-a-memory-mapped-file-be)? – tadman Apr 10 '18 at 20:21
  • Current x86_64 processors have a 48 bit hard limit on address space size, but you'll finish the physical RAM needed to store page entries way before hitting it. – Matteo Italia Apr 10 '18 at 20:29
  • Do you actually have more than 50GB of RAM? If so, the answer is almost definitely yes. If not, you're just hoping the OS swapping pages will be more efficient or simpler than windowing the mmap or the like (it won't be more efficient, but it might be efficient enough…), the answer is probably yes, but try it and see. For full details, see my answer. – abarnert Apr 10 '18 at 20:40

1 Answers1

5

You usually don't need to worry about the limit for 64-bit mmap, but I'll explain why.


First, 32-bit platforms can in theory support up to 2**32, or 4GB. But the OS reserves a chunk of that for itself. On Windows, this chunk is a whole 2GB by default (you can configure it to be lower, but some software may break because it assumes it's safe to use "signed pointers"), while on other platforms it's usually more like 512MB.

Similarly, 64-bit platforms can in theory support up to 2**64, or 16EB. Here, whether the OS reserves 512MB or 2GB isn;t going to make a significant dent.


However, your hardware may limit things to somewhere between 44 and 56 bits (most current systems are 48-bit), and 44 bits is only 256TB.

And your OS may limit things even farther. IIRC, the earliest 64-bit linux kernels only used 40 bits (because there was no hardware that could use more at the time), which is only 1TB.

Finally, on Windows, if you're using a "basic" or "starter" edition, it may limit things even further to as low as 8GB for Windows 8 Home Basic Edition. This is the only one that might affect your file.


But, unlike the case with the later days of 32 bits, pretty much nobody in 2018 has more physical RAM than their OS can page all at once. Plenty of people are running 32-bit Windows (or 32-bit Python on 64-bit Windows) on machines with more than 4GB of RAM, but it's nearly impossible to load up a 64-bit system with a 40-bit-limited OS with more than 1TB of RAM.

So, however much RAM you have, you should be able to use most of it for mmap.


Occasionally, you want to mmap a file that won't actually fit into your RAM. You'll then be relying on the OS's page swapping, which will of course be less efficient than windowing smaller maps of the file, but may be efficient enough, and may be a lot simpler.

In that case, it will probably work on your system, but there's really no way to say for sure without knowing a lot more than you've told us. And the easiest answer (as usual for Python) is EAFP: try it, and prepare to handle the exception where it fails (whether programmatically, or by just reading the stack trace and searching StackOverflow for a solution).

abarnert
  • 354,177
  • 51
  • 601
  • 671