What are the workaround options for python out of memory error?

Question

I am reading a x,y,z point file (LAS) into python and have run into memory errors. I am interpolating unknown points between known points for a project I am working on. I began working with small files (< 5,000,000 points) and was able to read/write to a numpy array and python lists with no problem. I have received more data to work with (> 50,000,000 points) and now my code fails with a MemoryError.

What are some options for handling such large amounts of data? I do not have to load all data into memory at once, but I will need to look at neighboring points using scipy kd-tree I am using Python 2.7 32 bit on a 64 bit Windows XP OS.

Thanks in advance.

EDIT: Code is posted below. I took out code for long calculations and variable definitions.

from liblas import file
import numpy as np

f = file.File(las_file, mode='r')
num_points = int(f.__len__())
dt = [('x', 'f4'), ('y', 'f4'), ('z', 'f4'), ('i', 'u2'), ('c', 'u1'), ('t', 'datetime64[us]')]
xyzict = np.empty(shape=(num_points,), dtype = dt)
counter = 0
for p in f:
    newrow = (p.x, p.y, p.z, p.intensity, p.classification, p.time)
    xyzict[counter] = newrow    
    counter += 1

dropoutList = []
counter = 0
for i in np.nditer(xyzict):
    # code to define P1x, P1y, P1z, P1t
    if counter != 0:
        # code to calculate n, tDiff, and seconds 
        if n > 1 and n < scanN:
            # code to find v and vD
            for d in range(1, int(n-1)):
                # Code to interpolate x, y, z for points between P0 and P1
                # Append tuple of x, y, and z to dropoutList
                dropoutList.append(vD)
    # code to set x, y, z, t for next iteration
    counter += 1

Can you show the code that is giving the error? (Or a small snippet that reproduces the problem?) There may be a way to make it more efficient, but it's impossible to tell without the code. — David Robinson, Nov 13 '13 at 17:17
Are you using `np.loadtxt` or `np.genfromtxt`? If so, they're quite inefficient for large files. (Not to plug my own answer, but it's relevant: http://stackoverflow.com/a/8964779/325565 ) You can either roll your own reader and use `fromiter` or, these days, just use pandas. (Pandas happens to have a very efficient whitespace-delimited ascii reader.) — Joe Kington, Nov 13 '13 at 17:50

score 7 · Accepted Answer · edited May 23 '17 at 10:31

Regardless of the amount of RAM in your system, if you are running 32-bit python, you will have a practical limit of about 2 GB of RAM for your application. There are a number of other questions on SO that address this (e.g., see here). Since the structure you are using in your ndarray is 23 bytes and you are reading over 50,000,000 points, that already puts you at about 1 GB. You haven't included the rest of your code so it isn't clear how much additional memory is being consumed by other parts of your program.

If you have well over 2 GB of RAM in your system and you will continue to work on large data sets, you should install 64-bit python to get around this ~ 2 GB limit.

Aaron Digulla · Answer 2 · 2013-11-14T08:19:14.143

2

Save the points in a binary file on disk and then use numpy.memmap That'll be a bit slower but might not hurt (depending on the algorithm).

Or try the 64 bit version of Python; you probably need more than 2GB of data.

Lastly, check your code how it works with the data. With that many elements, you shouldn't try to copy / clone the array. Use views instead.

If everything else fails, try a 64bit version of Linux (since you won't get a 64bit Windows for free).

edited Nov 14 '13 at 08:19

answered Nov 13 '13 at 17:20

Aaron Digulla

321,842
108
597
820

6

How is switching to Linux supposed to help with a Python out-of-memory error? The dig on Windows here is completely gratuitous. – John Y Nov 13 '13 at 17:37
I'm not familiar with reading/writing binary files to disk (I'm a newb with python). – Barbarossa Nov 13 '13 at 19:58
I think that part of the documentation refers to `numpy.save()`: http://docs.scipy.org/doc/numpy/reference/routines.io.html – Aaron Digulla Nov 14 '13 at 08:17
3

It sounds like the linux part is for cost concerns. – Greg Feb 17 '16 at 23:46

What are the workaround options for python out of memory error?

2 Answers2

Linked