4

Many functions like in1d and setdiff1d are designed for 1-d array. One workaround to apply these methods on N-dimensional arrays is to make numpy to treat each row (something more high dimensional) as a value.

One approach I found to do so is in this answer Get intersecting rows across two 2D numpy arrays by Joe Kington.

The following code is taken from this answer. The task Joe Kington faced was to detect common rows in two arrays A and B while trying to use in1d.

import numpy as np
A = np.array([[1,4],[2,5],[3,6]])
B = np.array([[1,4],[3,6],[7,8]])

nrows, ncols = A.shape
dtype={'names':['f{}'.format(i) for i in range(ncols)],
       'formats':ncols * [A.dtype]}

C = np.intersect1d(A.view(dtype), B.view(dtype))

# This last bit is optional if you're okay with "C" being a structured array...
C = C.view(A.dtype).reshape(-1, ncols)

I am hoping you to help me with any of the following three questions. First, I do not understand the mechanisms behind this method. Can you try to explain it to me?

Second, is there other ways to let numpy treat an subarray as one object?

One more open question: dose Joe's approach have any drawbacks? I mean whether treating rows as a value might cause some problems? Sorry this question is pretty broad.

Tai
  • 7,684
  • 3
  • 29
  • 49
  • to extend your query one step more, examine 'isin' in numpy 1.13. For example: np.isin(a.view(dtype), b.view(dtype)) which yields array([[ True], [False], [ True]], dtype=bool). Without the dtype you get a position test... np.isin(a, b) ... yields array([[ True, True], [False, False], [ True, True]], type=bool) – NaN Jan 10 '18 at 02:12
  • @NaN Cool. What's your `a`, `b` and `dtype`? – Tai Jan 10 '18 at 02:25
  • Do you understand structured arrays? Joe's answer is a generalization of Ram Kumar Karn's. `A.view(dtype)` is a 1d array whose elements contain all the bytes of one row of `A`. `intersect1d` apparently doesn't care how big the elements are as long as they can be compared. – hpaulj Jan 10 '18 at 04:58
  • @hpaulj I tried to read some more early on tonight. I think I have a feeling about why it works. – Tai Jan 10 '18 at 05:01
  • 1
    Another approach is to use broadcasting to compare the arrays element by element (creating a higher dimensional boolean array), and then use some combination of `any` and `all` to reduce one or more of the axes. https://stackoverflow.com/questions/38674027/find-the-row-indexes-of-several-values-in-a-numpy-array – hpaulj Jan 10 '18 at 05:10
  • @hpaulj thanks for offering the method! I was actually asking about other ways to do Joe's trick but this is also helpful! Though, I think structure arrays might be the only way to achieve this. (Not knowing.) Thank you for help clarifying my question. – Tai Jan 10 '18 at 05:13
  • @Tai, I just used lower-case for your A and B and the dtype was therefore the same as yours. Sorry for the confusion – NaN Jan 10 '18 at 07:45
  • @NaN Got you. No problems. Got you :P – Tai Jan 10 '18 at 14:12

1 Answers1

0

Try to post what I have learned. The method Joe used is called structured arrays. It will allow users to define what is contained in a single cell/element.

We take a look at the description of the first example the documentation provided.

x = np.array([(1,2.,'Hello'), (2,3.,"World")], ...  
              dtype=[('foo', 'i4'),('bar', 'f4'), ('baz', 'S10')])

Here we have created a one-dimensional array of length 2. Each element of this array is a structure that contains three items, a 32-bit integer, a 32-bit float, and a string of length 10 or less.

Without passing in dtype, however, we will get a 2 by 3 matrix.

With this method, we would be able to let numpy treat a higher dimensional array as an single element with properly set dtype.


Another trick Joe showed is that we don't need to really form a new numpy array to achieve the purpose. We can use the view function (See ndarray.view) to change the way numpy view data. There is a section of Note section in ndarray.view that I think you should take a look before utilizing the method. I have no guarantee that there would not be side effects. The paragraph below is from the note section and seems to call for caution.

For a.view(some_dtype), if some_dtype has a different number of bytes per entry than the previous dtype (for example, converting a regular array to a structured array), then the behavior of the view cannot be predicted just from the superficial appearance of a (shown by print(a)). It also depends on exactly how a is stored in memory. Therefore if a is C-ordered versus fortran-ordered, versus defined as a slice or transpose, etc., the view may give different results.

Other reference

https://docs.scipy.org/doc/numpy-1.13.0/reference/arrays.dtypes.html https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.dtype.html

Tai
  • 7,684
  • 3
  • 29
  • 49