3

I have a set of points (x and y) and i wish to know the maximum and Minimum values of X and Y (the Bounding Box). I coded these lines where i read all points with list comprehension and after i use max and min on X and Y. In the end i delete the points.

This solution is not memory efficency because i need to read all points

points = [(p.x,p.y) for p in lasfile.File(inFile,None,'r')] # read in list comprehension
X_Max = max(zip(*points)[0])
X_Min = min(zip(*points)[0])
Y_Max = max(zip(*points)[1])
Y_Min = min(zip(*points)[1])
del points

I am asking a suggetion to avoid this step (store all points in the memory). Thanks in advance Gianni

Gianni Spear
  • 7,033
  • 22
  • 82
  • 131

2 Answers2

5
X_Max = float('-inf')
X_Min = float('+inf')
Y_Max = float('-inf')
Y_Min = float('+inf')

for p in lasfile.File(inFile,None,'r'):
    X_Max = max(X_Max, p.x)
    X_Min = min(X_Min, p.x)
    Y_Max = max(Y_Max, p.y)
    Y_Min = min(Y_Min, p.y)

This way you only loop once over your file, as well as avoiding having more than one point in memory at a time.

EDIT File() is providing a iterator, which is only reading one line at a time from the file and supplying it to the loop variable p, as and when required.

In your question you used square brackets around your initial points assignment. This is a list comprehension which, as the name suggests, creates a list - so all points are held in memory from that point on. If you used parentheses instead like this:

points = ((p.x,p.y) for p in lasfile.File(inFile,None,'r'))

X_Max = float('-inf')
X_Min = float('+inf')
Y_Max = float('-inf')
Y_Min = float('+inf')

for p in points:
    X_Max = max(X_Max, p.x)
    X_Min = min(X_Min, p.x)
    Y_Max = max(Y_Max, p.y)
    Y_Min = min(Y_Min, p.y)

...then Python doesn't create a list, but a generator/iterator - which would return one point at a time until the file is exhausted. This would avoid having all points in memory at the same time - but can only be iterated over once.

For the sake of simplicity though, I've dropped the creation of an additional iterator in favour of just using the lasfile.File() one directly.

Steve Mayne
  • 22,285
  • 4
  • 49
  • 49
  • Points are still being stored in memory here. – jrtc27 Oct 18 '12 at 11:12
  • Thanks Steve, but my idea is try to not store points (more than 2 million) in the memory – Gianni Spear Oct 18 '12 at 11:14
  • 2
    It's a generator, so it should only have one point at a time in memory. Note the parentheses rather than square brackets around the first line. – Steve Mayne Oct 18 '12 at 11:14
  • Why the `points` generator and not just `for p in lasfile.File(inFile,None,'r'): X_Max = max(X_Max, p.x) ...`? – sloth Oct 18 '12 at 11:21
  • @Mr.Steak Good point. Tidiness, perhaps? Tenuous, I know. I'll drop it. – Steve Mayne Oct 18 '12 at 11:22
  • You should note that it is the `File` class that's providing the generator (assuming the OP uses libLAS, which I think he does). – sloth Oct 18 '12 at 12:39
  • @Mr.Steak I've added some more explanation text – Steve Mayne Oct 18 '12 at 13:15
  • @SteveMayne: thanks for your help. With these large data set (> 2 million of points) i have always problem in Python. recently I wrote these posts in stackoverflow: http://stackoverflow.com/questions/12923935/python-improve-memory-efficiency-of-a-script and this http://stackoverflow.com/questions/12883237/python-improve-the-efficency-of-my-script-using-multiprocessing-module-tips-an to explain these problems – Gianni Spear Oct 18 '12 at 14:21
  • @SteveMayne i will test tonight for sure, and you will get feedback on SO. Today is/was a meeting (full) day and now I am on PC to preparing the data of tonight. Anyway, please could i ask to write your solution using points = ((p.x,p.y) for p in lasfile.File(inFile,None,'r'))? Thanks in advance Gianni – Gianni Spear Oct 18 '12 at 15:00
  • @SteveMayne: with your solution i have X_Max as 313372.98 (what i need). Some people suggest to use list comprehension instead of for when you have more a 1 million of point (sppeding performance). I will try to find a solution you your code and points = ((p.x,p.y) for p in lasfile.File(inFile,None,'r')) – Gianni Spear Oct 18 '12 at 16:10
  • 1
    @Gianni I've filled in the gaps in the generator version of my answer. The list comprehension won't save you anything over using a generator in the example above - and it will cost you masses of RAM. To clarify - The list comprehension is the bit in square brackets on the first line of your code - not the max() calls, passing in the full list. – Steve Mayne Oct 19 '12 at 08:24
  • @SteveMayne thanks again Steve, you are really clear and expert – Gianni Spear Oct 19 '12 at 10:27
  • No problem @Gianni - please don't forget to 'accept' this answer if it's the one that fixed it for you :-) – Steve Mayne Oct 19 '12 at 11:34
  • @SteveMayne: done :). If you have time could you give a look a this http://stackoverflow.com/questions/12923935/python-improve-memory-efficiency-of-a-script. I think a answer can be really quoted by the comunity that use Shapefile module http://code.google.com/p/pyshp/ – Gianni Spear Oct 19 '12 at 11:41
3

You could use a generator expression for points and use the key argument for max and min:

from itertools import tee
points = ((p.x,p.y) for p in lasfile.File(inFile,None,'r'))
points = tee(points, 4)

X_Max = max(points[0], key=lambda x:x[0])[0]
X_Min = min(points[1], key=lambda x:x[0])[0]
Y_Max = max(points[2], key=lambda x:x[1])[1]
Y_Min = min(points[3], key=lambda x:x[1])[1]

Update:

I added a call to itertools.tee to duplicate the original generator.

As noted in the comments, this solution has the downside that you have to (unnecessaryly) iterate over your file 4 times. Calculating the maxes and mins on every iteration, as @SteveMayne does, saves you from this.

Community
  • 1
  • 1
halex
  • 16,253
  • 5
  • 58
  • 67
  • Thanks halex, but my idea is try to not store points (more than 2 million) in the memory. PS: i didn't downvote (I never do) – Gianni Spear Oct 18 '12 at 11:16
  • 2
    The points should not be stored in memory because of the generator expression made with `(...)` instead of `[...]`. The points are fetched one at a time. – halex Oct 18 '12 at 11:18
  • 2
    Are you sure this will work? After the first `max` call, the generator `points` will be exhausted, so the following call to `min` will result in an exception. – sloth Oct 18 '12 at 11:20
  • 2
    @Mr.Steak that's what I was thinking too. Even if it worked, you'd have to iterate over the file 4 times to do something you could do with a single loop. – Steve Mayne Oct 18 '12 at 11:21
  • @halex: with your solution i have X_Max as "(313372.98, 6956949.2700000005)" instead of one value (the X max) – Gianni Spear Oct 18 '12 at 16:06
  • 1
    @Gianni Sorry. I now fixed it. The problem was that `max` and `min` returned the whole point of x and y value as a tuple of 2 elements. So I added another access for the first element for the x values and the second for y values. Thanks for being so attentive :). – halex Oct 18 '12 at 16:31
  • @halex: it;s a pleasure test and see if there are some bugs. Great job you and Steve Mayne – Gianni Spear Oct 18 '12 at 16:39
  • @halex both solution work without memory problem. Maybe your solution is a little more fast. It's strange, Python is slow in X_Max = max(points[0], key=lambda x:x[0])[0], and the rest is quite fast – Gianni Spear Oct 18 '12 at 16:45