I have a python3
script that operates with numpy.memmap
arrays. It writes an array to newly generated temporary file that is located in /tmp
:
import numpy, tempfile
size = 2 ** 37 * 10
tmp = tempfile.NamedTemporaryFile('w+')
array = numpy.memmap(tmp.name, dtype = 'i8', mode = 'w+', shape = size)
array[0] = 666
array[size-1] = 777
del array
array2 = numpy.memmap(tmp.name, dtype = 'i8', mode = 'r+', shape = size)
print('File: {}. Array size: {}. First cell value: {}. Last cell value: {}'.\
format(tmp.name, len(array2), array2[0], array2[size-1]))
while True:
pass
The size of the HDD is only 250G. Nevertheless, it can somehow generate 10T large files in /tmp
, and the corresponding array still seems to be accessible. The output of the script is following:
File: /tmp/tmptjfwy8nr. Array size: 1374389534720. First cell value: 666. Last cell value: 777
The file really exists and is displayed as being 10T large:
$ ls -l /tmp/tmptjfwy8nr
-rw------- 1 user user 10995116277760 Dec 1 15:50 /tmp/tmptjfwy8nr
However, the whole size of /tmp
is much smaller:
$ df -h /tmp
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 235G 5.3G 218G 3% /
The process also is pretending to use 10T virtual memory, which is also not possible. The output of top
command:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
31622 user 20 0 10.000t 16592 4600 R 100.0 0.0 0:45.63 python3
As far as I understand, this means that during the call of numpy.memmap
the needed memory for the whole array is not allocated and therefore displayed file size is bogus. This in turn means that when I start to gradually fill the whole array with my data, at some point my program will crash or my data will be corrupted.
Indeed, if I introduce the following in my code:
for i in range(size):
array[i] = i
I get the error after a while:
Bus error (core dumped)
Therefore, the question: how to check at the beginning, if there is really enough memory for the data and then indeed reserve the space for the whole array?