Questions tagged [numpy-memmap]

An advanced numpy.memmap() utility to avoid RAM-size limit and reduce final RAM-footprint ( at a reasonable cost of O/S-cached fileIO mediated via a small-size in-RAM proxy-view window into whole array-data ) Creates and handles a memory-map to an array stored in a binary file on disk.

Creates and handles a memory-map to an array stored in a binary file on disk.

Memory-mapped files are used for arranging access to large non-in-RAM arrays via small proxy-segments of an O/S-cached area of otherwise unmanageably large data files.

Leaving most of the data on disk, without reading the entire file into RAM memory and working with data via smart, moving, O/S-cached window-view into the non-in-RAM big file, enables to escape from both O/S RAM-limits and from adverse side-effects of python's memory management painfull reluctance to release once allocated memory-blocks anytime before the python program termination.

numpy's memmap's are array-like objects.

This differs from Python's mmap module, which uses file-like objects.

101 questions
24
votes
2 answers

Can memmap pandas series. What about a dataframe?

It seems that I can memmap the underlying data for a python series by creating a mmap'd ndarray and using it to initialize the Series. def assert_readonly(iloc): try: iloc[0] = 999 # Should be non-editable …
user48956
  • 14,850
  • 19
  • 93
  • 154
7
votes
2 answers

numpy memmap memory usage - want to iterate once

let say I have some big matrix saved on disk. storing it all in memory is not really feasible so I use memmap to access it A = np.memmap(filename, dtype='float32', mode='r', shape=(3000000,162)) now let say I want to iterate over this matrix (not…
user2717954
  • 1,822
  • 2
  • 17
  • 28
7
votes
1 answer

numpy mean is larger than max for memmap

I have an array of timestamps, increasing for each row in the 2nd column of matrix X. I calculate the mean value of the timestamps and it's larger than the max value. I'm using a numpy memmap for storage. Why is this happening? >>>…
siamii
  • 23,374
  • 28
  • 93
  • 143
6
votes
1 answer

Do xarray or dask really support memory-mapping?

In my experimentation so far, I've tried: xr.open_dataset with chunks arg, and it loads the data into memory. Set up a NetCDF4DataStore, and call ds['field'].values and it loads the data into memory. Set up a ScipyDataStore with mmap='r', and…
5
votes
0 answers

Efficient way of using numpy memmap when training neural network with pytorch

I'm training a neural network on a database of images. My images are of full HD (1920 x 1080) resolution, but for training, I use random crops of size 256x256. Since reading the full image and then cropping is not efficient, I'm using numpy memmap…
Nagabhushan S N
  • 6,407
  • 8
  • 44
  • 87
5
votes
2 answers

How to read a large text file avoiding reading line-by-line :: Python

I have a large data file (N,4) which I am mapping line-by-line. My files are 10 GBs, a simplistic implementation is given below. Though the following works, it takes huge amount of time. I would like to implement this logic such that the text file…
nuki
  • 101
  • 5
5
votes
0 answers

Caching a data frame in joblib

Joblib has functionality for sharing Numpy arrays across processes by automatically memmapping the array. However this makes use of Numpy specific facilities. Pandas does use Numpy under the hood, but unless your columns all have the same data type,…
shadowtalker
  • 12,529
  • 3
  • 53
  • 96
4
votes
1 answer

Is it possible to save boolean numpy arrays on disk as 1bit per element with memmap support?

Is it possible to save numpy arrays on disk in boolean format where it takes only 1 bit per element? This answer suggests to use packbits and unpackbits, however from the documentation, it seems that this may not support memory mapping. Is there a…
Nagabhushan S N
  • 6,407
  • 8
  • 44
  • 87
4
votes
3 answers

numpy.memmap returns not enough memory while there are plenty available

During a typical call to numpy.memmap() on a 64bit windows machine, python raise the following error: OSError: [WinError 8] Not enough memory resources are available to process this command A different windows machine raise the same error with a…
auzn
  • 613
  • 3
  • 14
4
votes
0 answers

Numpy Memmap WinError8

My first StackOverflow message after 6 years of using great experience from this site. Thank you all for all the great help you have offered to me and to others. This problem, however, baffles me completely and I would like to ask for assistance…
4
votes
1 answer

Numpy Memmap Ctypes Access

I'm trying to use a very large numpy array using numpy memmap, accessing each element as a ctypes Structure. class My_Structure(Structure): _fields_ = [('field1', c_uint32, 3), ('field2', c_uint32, 2), ('field3',…
sheridp
  • 1,386
  • 1
  • 11
  • 24
4
votes
1 answer

I can't remove file, created by memmap

I can't remove file created numpy.memmap funtion class MyClass def __init__(self): self.fp = np.memmap(filename, dtype='float32', mode='w+', shape=flushed_chunk_shape) ... def __del__(self): del self.fp os.remove(filename) When I…
4
votes
2 answers

packing boolean array needs go throught int (numpy 1.8.2)

I'm looking for the more compact way to store boolean. numpy internally need 8bits to store one boolean, but np.packbits allow to pack them, that's pretty cool. The problem is that to pack in a 4e6 bytes array a 32e6 bytes array of boolean we need…
user3313834
  • 7,327
  • 12
  • 56
  • 99
3
votes
0 answers

Does an ndarray have a buffer which is mmap?

How to tell if an ndarray has a buffer which is mmap? I want to tell apart x and y. import numpy as np import mmap with open("f.dat", "wb+") as f: f.seek(np.dtype(float).itemsize - 1, 0) f.write(b'\0') f.seek(0, 0) mm =…
slitvinov
  • 5,693
  • 20
  • 31
3
votes
0 answers

Numpy memmap throttles with Pytorch Dataloader when available RAM less than file size

I'm working on a dataset that is too big to fit into RAM. The solution I'm trying currently is to use numpy memmap to load one sample/row at a time using Dataloader. The solution looks something like this: class MMDataset(torch.utils.data.Dataset): …
Kevin
  • 281
  • 2
  • 5
1
2 3 4 5 6 7