I've been trying to append a list of points of a pointcloud read with laspy to another list of points, basically merging two pointclouds. When merging multiple pointclouds I have been appending all points to the same np.ndarray in order to save it back to a laspy file. Now as soon as the combined size of all pointclouds I want to merge exceeds about 350 MB I get a MemoryError
.
I've tried using a different method of writing a pointcloud file, so that I don't have to read all points into the memory at once, but that failed, as laspy is really weird when it comes to writing pointcloud files, here are a few things I figured out:
laspy.File.points
has the following format:
array([((24315, 12245, 12080, 0, 24, 0, 0, 0, 202, 23205, 24735, 21930),),
...,
((15155, -23292, -6913, 0, 56, 0, 0, 0, 343, 36975, 37230, 37485),)],
dtype=[('point', [('X', '<i4'), ('Y', '<i4'), ('Z', '<i4'), ('intensity', '<u2'), ('flag_byte', 'u1'), ('raw_classification', 'u1'), ('scan_angle_rank', 'i1'), ('user_data', 'u1'), ('pt_src_id', '<u2'), ('red', '<u2'), ('green', '<u2'), ('blue', '<u2')])])
- The variable type of
laspy.File.points
isnumpy.ndarray
- The shape of
laspy.File.points
is(<numberOfRows>,)
=> one-dimensional array, even though it has 12 values per row(?) - The rows have the type
numpy.void
- In order to write a
laspy.File
you need to create a new File in write mode, copy the header from an existing file and set the File.points to a numpy array of exactly the type as described above. After setting the points once, you cannot set them again, meaning the final count of rows needs to be known when setting the points. - You can change the row's values by using
laspy.File.set_x(<arrayOfXValues>)
(and similar), needs to be the same length aslaspy.File.points
Now my PC has 16 GB RAM of which about 10 GB are free when I start the merging. Using psutils
I get my used and available memory, and I never go below 9 GB free memory. Using psutil.Process(os.getpid()).memory_info().rss
I get the used memory for this process, which never exceeds 650 MB.
When merging I read the first file, then iterate over the other files, read them one by one and call numpy.append(combinedPoints, otherPointcloudPoints)
to stack all points together. This, however, throws a MemoryError
, when the above listed conditions are true.
Here is the code to merge multiple pointclouds to one new pointcloud (This all happens in a class PointCloudFileIO
, self.file
is an instance of laspy.File
). util.inMB
calculates a size from bytes to megabytes.
def mergePointClouds(self, listPaths, newPath):
realSize = util.inMB(psutil.Process(os.getpid()).memory_info().rss)
print("Process Memory used at start: {:.2f}MB".format(realSize))
print("Available memory at start: {:.2f}MB".format(util.inMB(psutil.virtual_memory().available)))
pointsOwn = self.file.points
firstOtherReader = PointCloudFileIO(listPaths[0])
pointsCombined = np.append(pointsOwn, firstOtherReader.file.points)
realSize = util.inMB(psutil.Process(os.getpid()).memory_info().rss)
print("Process Memory used after first merge: {:.2f}MB".format(realSize))
print("Available memory after first merge: {:.2f}MB".format(util.inMB(psutil.virtual_memory().available)))
for i in range(1, len(listPaths)):
otherReader = PointCloudFileIO(listPaths[i])
otherPoints = otherReader.file.points
pointsCombined = np.append(pointsCombined, otherPoints)
realSize = util.inMB(psutil.Process(os.getpid()).memory_info().rss)
print("Process Memory used in loop: {:.2f}MB".format(realSize))
print("Available memory in loop: {:.2f}MB | Used: {:.2f}MB | Percent: {}%".format(util.inMB(psutil.virtual_memory().available), util.inMB(psutil.virtual_memory().used), psutil.virtual_memory().percent))
outFile = File(newPath, mode='w', header=self.file.header)
outFile.points = pointsCombined
outFile.close()
For almost all use-cases that I have, this works perfectly fine. It merges all provided pointclouds to a new pointcloud in a new file. However when the resulting pointcloud is a little too large, despite having way more memory than needed, I get a MemoryError
.
Here is the log for when I start the program with these pointclouds (download .laz files), you'll need to unzip the .laz files with laszip before they are usable with laspy (when using Windows, at least):
Process Memory used at start: 21.18MB
Available memory at start: 9793.35MB | Used: 6549.50MB | Percent: 40.1%
Process Memory used after first merge: 381.63MB
Available memory after first merge: 9497.64MB | Used: 6845.20MB | Percent: 41.9%
Process Memory used in loop: 559.52MB
Available memory in loop: 9309.36MB | Used: 7033.48MB | Percent: 43.0%
Process Memory used in loop: 637.05MB
Available memory in loop: 9301.00MB | Used: 7041.85MB | Percent: 43.1%
Traceback (most recent call last):
File "optimization_test.py", line 7, in <module>
f1.mergePointClouds(paths, "someShiet.las")
File "C:\Users\viddie\Desktop\git\GeoLeo\geoleo\pointcloud.py", line 175, in mergePointClouds
pointsCombined = np.append(pointsCombined, otherPoints)
File "C:\Users\viddie\AppData\Local\Programs\Python\Python36-32\lib\site-packages\numpy\lib\function_base.py", line 5166, in append
return concatenate((arr, values), axis=axis)
MemoryError
If anyone know a cause for this, any help is appreciated.