0

I am investigating whether storing points in a numpy array helps me search for points, and I have several questions about it.

I have a Point class that represents a 3-dimensional point.

class Point( object ):
  def __init__( self, x, y, z ):
    self.x = x
    self.y = y
    self.z = z

  def __repr__( self ):
    return "<Point (%r, %r, %r)>" % ( self.x, self.y, self.z )

I build a list of Point objects. Notice that the coordinates (1, 2, 3) deliberately occurs twice; that is what I am going to search for.

>>> points = [Point(1, 2, 3), Point(4, 5, 6), Point(1, 2, 3), Point(7, 8, 9)]

I store the Point objects in a numpy array.

>>> import numpy
>>> npoints = numpy.array( points )
>>> npoints
array([<Point (1, 2, 3)>, <Point (4, 5, 6)>, <Point (1, 2, 3)>,
   <Point (7, 8, 9)>], dtype=object)

I search for all points with coordinates (1, 2, 3) in the following manner.

>>> numpy.where( npoints == Point(1, 2, 3) )
>>> (array([], dtype=int64),)

But, the result is not useful. So, that does not seem to be the correct way to do it. Is numpy.where the thing to use? Is there another way to express the condition for numpy.where that would be successful?

The next thing I try is to store just the coordinates of the points in a numpy array.

>>> npoints = numpy.array( [(p.x, p.y, p.z) for p in points ])
>>> npoints
array([[1, 2, 3],
      [4, 5, 6],
      [1, 2, 3],
      [7, 8, 9]])

I search for all points with coordinates (1,2,3) in the following manner.

>>> numpy.where( npoints == [1,2,3] )
(array([0, 0, 0, 2, 2, 2]), array([0, 1, 2, 0, 1, 2]))

The result is, at least, something I can deal with. The array of row indexes in the first return value, array([0, 0, 0, 2, 2, 2]), does indeed tell me that the coordinates I am searching for are in rows 0 and 2 of npoints. I could get away with doing something like the following.

>>> rows, cols = numpy.where( npoints == [1,2,3] )
>>> rows
array([0, 0, 0, 2, 2, 2])
>>> cols
array([0, 1, 2, 0, 1, 2])
>>> foundRows = set( rows )
>>> foundRows
set([0, 2])
>>> for r in foundRows:
...   # Do something with npoints[r]

However, I feel that I am not really using numpy.where appropriately, and that I am just getting lucky in this particular situation.

What is the appropriate way to find all occurrences of a n-dimensional point (i.e., a row with particular values) in a numpy array?

Preserving the order of the array is essential.

Mike Finch
  • 746
  • 1
  • 7
  • 20
  • 1
    See if [this](https://stackoverflow.com/questions/38674027/) helps. – Divakar Sep 07 '17 at 16:42
  • Don't use `numpy` arrays of custom objects. The problem is that `==` is implemented by *identity* for custom objects by default. Regardless, the biggest issue is that you are essentially creating an inefficient Python `list` when you make a `dtype=object` array. – juanpa.arrivillaga Sep 07 '17 at 16:48
  • Also consider the `Point` class from [`shapely`](https://pypi.python.org/pypi/Shapely). – Brad Solomon Sep 07 '17 at 18:08
  • In the [post](https://stackoverflow.com/questions/38674027/find-the-row-indexes-of-several-values-in-a-numpy-array) suggested by @Divakar, approach #1 works. I do not grok the syntax enough to know **why** it works yet. That will keep me busy for awhile. – Mike Finch Sep 07 '17 at 18:58

1 Answers1

0

You can create a “rich comparison” method object.__eq__(self, other) inside your Point class to be able to use == among Point objects:

class Point( object ):
  def __init__( self, x, y, z ):
    self.x = x
    self.y = y
    self.z = z

  def __repr__( self ):
    return "<Point (%r, %r, %r)>" % ( self.x, self.y, self.z )
  def __eq__(self, other):
    return self.x == other.x and self.y == other.y and self.z == other.z

import numpy
points = [Point(1, 2, 3), Point(4, 5, 6), Point(1, 2, 3), Point(7, 8, 9)]
npoints = numpy.array( points )
found = numpy.where(npoints == Point(1, 2, 3))
print(found) # => (array([0, 2]),)
DjaouadNM
  • 22,013
  • 4
  • 33
  • 55
  • Adding a equality rich comparison method to the Point class is a good idea. That worked. – Mike Finch Sep 07 '17 at 18:37
  • I should type faster! :) Adding a equality rich comparison method to the Point class works, and is a good example in general. However, in my situation, the Point class I am actually using is in a third party module which I am not able to modify. Also, as @juanpa.arrivillaga brings up, letting the item data type of a numpy array be an object defeats the efficiency of using a numpy array. I suspected that. But, I might continue with this arrangement anyway. Speed is not my primary concern, and it could be more convenient that using the list `index` method. – Mike Finch Sep 07 '17 at 18:45