0

this question might be phrased way more convoluted than it is, since I am rather sure the solution is rather generic.

The Situation is as follows. We are given a numpy array of (n, 3), where n is the number of points specified in 3D coordinates (x,y,z). I now want to produce a slice of this array, that contains all the points the fall in a certain area. For example whose x-value is between -50 and 50 and the z-value below 10.

I could certainly iterate over the whole array and check the conditions for every point, however, I do suspect some numpy magic to exist that makes this operation a whole lot faster. Maybe you can help me to come up with an idea.

Thank you! :)

for x,y,z in points:
    if x >= x_lower and x <= x_upper and y >= y_lower and y <= y_upper and z.... and so on:
        #keep this point
pcsso
  • 121
  • 2
  • 9
  • 1
    You'll want to store your data in a structure designed for these types of query, not a regular list, matrix, or array. Read up, for example, on [quadtrees](https://en.wikipedia.org/wiki/Quadtree). – chepner Jan 15 '20 at 16:07
  • Question: will you do this one per dataset, or many times. If the former than simply walking the array with a vectorised operation (as has been suggested) is fastest. Otherwise you'll want to think about data structures like r- and oct-trees which have construction costs but fast queries. – Richard Jan 15 '20 at 16:29
  • Does this answer your question? [Numpy: find elements within range](https://stackoverflow.com/questions/13869173/numpy-find-elements-within-range) – Nathan Jan 15 '20 at 16:30

1 Answers1

1

You can use np.where to analyse multiple comparisons like so:

import numpy as np

a = np.array([[0, 0, 0],
              [1, 2, 3],
              [6, 7, 7],
              [9, 0, 0],
              [0, 9, 0],
              [0, 0, 9],
              [-10, 0, 0]])

print(a[(a[:, 0] < 8 ) & (a[:, 0] >= 0) & (a[:, 1] < 8) & (a[:, 1] >= 0) & (a[:, 2] < 8) & (a[:, 2] >= 0)])
>>>[[0 0 0]
    [1 2 3]
    [6 7 7]]

The way this works if that if any of the statements:

  1. (a[:, 0] < 8 )
  2. (a[:, 0] >= 0)
  3. (a[:, 1] < 8)
  4. (a[:, 1] >= 0)
  5. (a[:, 2] < 8)
  6. (a[:, 2] >= 0))

is False it will return a False, if any of them are False, the bitwise comparer returns False for the entire statement (for more, see 'and' (boolean) vs '&' (bitwise) - Why difference in behavior with lists vs numpy arrays?).

This should be plenty fast, for an array of length 10000000, it takes me 0.43 seconds to run.

Nathan
  • 3,558
  • 1
  • 18
  • 38
  • A simple Boolean mask would be much faster. This is not a good place to use `where` – Mad Physicist Jan 15 '20 at 16:22
  • @MadPhysicist sure, but as I pointed out, this takes less about half a second, I doubt it matters very much – Nathan Jan 15 '20 at 16:24
  • @MadPhysicist I've included the other option as well. – Nathan Jan 15 '20 at 16:28
  • It matters if you're doing it a lot – Mad Physicist Jan 15 '20 at 16:32
  • Consider using `&` rather than `*` – Mad Physicist Jan 15 '20 at 16:32
  • thank you for the reply @Nathan! I suppose this is already an improvement over the iteration option and possibly the a[:,i] syntax is just what I had in mind – pcsso Jan 15 '20 at 16:33
  • @MadPhysicist thanks for the suggestion, I've changed it – Nathan Jan 15 '20 at 16:38
  • So here's another idea: make an `N, 3, C` empty array, with `C` being the number of comparisons you need. Then call [`np.less`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.less.html), `less_equal`, `greater`, `greater_equal` with the `out` parameter set to the appropriate slice of the empty array. Then call `np.all(arr, axis=(1, 2))` on it. That will short-circuit the computation, and might be faster. You can also play with which to put `C` into to get a speedup. – Mad Physicist Jan 15 '20 at 19:24