MemoryError in numpy.append despite enough RAM

Question

I've been trying to append a list of points of a pointcloud read with laspy to another list of points, basically merging two pointclouds. When merging multiple pointclouds I have been appending all points to the same np.ndarray in order to save it back to a laspy file. Now as soon as the combined size of all pointclouds I want to merge exceeds about 350 MB I get a MemoryError.

I've tried using a different method of writing a pointcloud file, so that I don't have to read all points into the memory at once, but that failed, as laspy is really weird when it comes to writing pointcloud files, here are a few things I figured out:

laspy.File.points has the following format:

array([((24315,  12245, 12080, 0, 24, 0, 0, 0, 202, 23205, 24735, 21930),),
       ...,
       ((15155, -23292, -6913, 0, 56, 0, 0, 0, 343, 36975, 37230, 37485),)],
      dtype=[('point', [('X', '<i4'), ('Y', '<i4'), ('Z', '<i4'), ('intensity', '<u2'), ('flag_byte', 'u1'), ('raw_classification', 'u1'), ('scan_angle_rank', 'i1'), ('user_data', 'u1'), ('pt_src_id', '<u2'), ('red', '<u2'), ('green', '<u2'), ('blue', '<u2')])])

The variable type of laspy.File.points is numpy.ndarray
The shape of laspy.File.points is (<numberOfRows>,) => one-dimensional array, even though it has 12 values per row(?)
The rows have the type numpy.void
In order to write a laspy.File you need to create a new File in write mode, copy the header from an existing file and set the File.points to a numpy array of exactly the type as described above. After setting the points once, you cannot set them again, meaning the final count of rows needs to be known when setting the points.
You can change the row's values by using laspy.File.set_x(<arrayOfXValues>) (and similar), needs to be the same length as laspy.File.points

Now my PC has 16 GB RAM of which about 10 GB are free when I start the merging. Using psutils I get my used and available memory, and I never go below 9 GB free memory. Using psutil.Process(os.getpid()).memory_info().rss I get the used memory for this process, which never exceeds 650 MB.

When merging I read the first file, then iterate over the other files, read them one by one and call numpy.append(combinedPoints, otherPointcloudPoints) to stack all points together. This, however, throws a MemoryError, when the above listed conditions are true.

Here is the code to merge multiple pointclouds to one new pointcloud (This all happens in a class PointCloudFileIO, self.file is an instance of laspy.File). util.inMB calculates a size from bytes to megabytes.

    def mergePointClouds(self, listPaths, newPath):
        realSize = util.inMB(psutil.Process(os.getpid()).memory_info().rss)
        print("Process Memory used at start: {:.2f}MB".format(realSize))
        print("Available memory at start: {:.2f}MB".format(util.inMB(psutil.virtual_memory().available)))

        pointsOwn = self.file.points
        firstOtherReader = PointCloudFileIO(listPaths[0])
        pointsCombined = np.append(pointsOwn, firstOtherReader.file.points)

        realSize = util.inMB(psutil.Process(os.getpid()).memory_info().rss)
        print("Process Memory used after first merge: {:.2f}MB".format(realSize))
        print("Available memory after first merge: {:.2f}MB".format(util.inMB(psutil.virtual_memory().available)))

        for i in range(1, len(listPaths)):
            otherReader = PointCloudFileIO(listPaths[i])
            otherPoints = otherReader.file.points

            pointsCombined = np.append(pointsCombined, otherPoints)

            realSize = util.inMB(psutil.Process(os.getpid()).memory_info().rss)
            print("Process Memory used in loop: {:.2f}MB".format(realSize))
            print("Available memory in loop: {:.2f}MB | Used: {:.2f}MB | Percent: {}%".format(util.inMB(psutil.virtual_memory().available), util.inMB(psutil.virtual_memory().used), psutil.virtual_memory().percent))

        outFile = File(newPath, mode='w', header=self.file.header)
        outFile.points = pointsCombined
        outFile.close()

For almost all use-cases that I have, this works perfectly fine. It merges all provided pointclouds to a new pointcloud in a new file. However when the resulting pointcloud is a little too large, despite having way more memory than needed, I get a MemoryError.

Here is the log for when I start the program with these pointclouds (download .laz files), you'll need to unzip the .laz files with laszip before they are usable with laspy (when using Windows, at least):

Process Memory used at start: 21.18MB
Available memory at start: 9793.35MB | Used: 6549.50MB | Percent: 40.1%
Process Memory used after first merge: 381.63MB
Available memory after first merge: 9497.64MB | Used: 6845.20MB | Percent: 41.9%
Process Memory used in loop: 559.52MB
Available memory in loop: 9309.36MB | Used: 7033.48MB | Percent: 43.0%
Process Memory used in loop: 637.05MB
Available memory in loop: 9301.00MB | Used: 7041.85MB | Percent: 43.1%
Traceback (most recent call last):
  File "optimization_test.py", line 7, in <module>
    f1.mergePointClouds(paths, "someShiet.las")
  File "C:\Users\viddie\Desktop\git\GeoLeo\geoleo\pointcloud.py", line 175, in mergePointClouds
    pointsCombined = np.append(pointsCombined, otherPoints)
  File "C:\Users\viddie\AppData\Local\Programs\Python\Python36-32\lib\site-packages\numpy\lib\function_base.py", line 5166, in append
    return concatenate((arr, values), axis=axis)
MemoryError

If anyone know a cause for this, any help is appreciated.

Note that the error is in a `concatenate` call. `np.append` is just a cover function that takes just 2 arrays. `concatenate` accepts a whole list of arrays. In any case, it creates a whole new array with each call. It does not modify the base in-place as a list append does. Usually we recommend collecting the arrays in a list, and doing one `concatenate` at the end. Usually it's faster. I can't say whether it will avoid the ME - for big data that happens soon or later. — hpaulj, Jun 19 '19 at 03:02
With that `dtype`, you have a `structured array`, 1d with multiple fields. `itemsize` gives the size, in bytes, for each record (roughly 4*12). And as you found out, when you concatenate these arrays, they all have to have the same compound `dtype. — hpaulj, Jun 19 '19 at 03:08
There are a couple of memory map ways of storing arrays on the disk, but I let others address that. — hpaulj, Jun 19 '19 at 03:08
It seems from the `Python36-32` in your path, that you're using a 32bit version of Python. In this case, your free memory is irrelevant, as Windows only allocates much less than that to the 32-bit application. (see [this post](https://stackoverflow.com/questions/18282867/python-32-bit-memory-limits-on-64bit-windows) ) — eaglesear, Apr 07 '20 at 16:17

score 0 · Answer 1 · answered Jun 19 '19 at 02:47

Just in case the operation actually doesn't fit in memory, you can dedicate some of your hard drive to operate as memory.

For Windows

Or you could use Swap-Space on Ubuntu.

Perhaps start with that until you can figure out how to reduce the memory consumption. Or at least this could help you troubleshoot by ensuring that you really do have enough memory.

MemoryError in numpy.append despite enough RAM

1 Answers1