1

I have two large data files, one with two columns and one with three columns. I want to select all the rows from the second file that are contained in the fist array. My idea was to compare the numpy arrays.

Let's say I have:

a = np.array([[1, 2, 3], [3, 4, 5],  [1, 4, 6]])

b = np.array([[1, 2], [3, 4]])

and the result should look like this:

[[1, 2, 3], [3, 4, 5]]

Any advice on that?

EDIT: So in the end this works. Not very handy but it works.

for ii in range(a.shape[0]):
    u, v, w = a[ii,:]
    for jj in range(b.shape[0]):
        if (u == b[jj, 0] and v == b[jj, 1]):
            print [u, v, w]
Tonechas
  • 13,398
  • 16
  • 46
  • 80
Ernie
  • 59
  • 5

3 Answers3

2

The numpy_indexed package (disclaimer: I am its author) contains functionality to solve such problems efficiently, without using any python loops:

import numpy_indexed as npi
a[npi.contains(b, a[:, :2])]
Eelco Hoogendoorn
  • 10,459
  • 1
  • 44
  • 42
0

If you prefer to not use another library but want to do this in numpy only, you can do something similar to what is suggested here and here, namely to use np.in1d (see docs) which does provide you with a mask indicating if an element in one 1D array exists in another 1D array. As the name indicates, this function only works for 1D arrays. But you can use a structured array view (using np.view) to cheat numpy into thinking you have 1D arrays. One caveat is though, that you need a deep copy of the first array a since np.view doesn't mix with slices, well. But if that is not too big of an issue for you, something along the lines of:

a_cp = a[:, :2].copy()
a[np.in1d(a_cp.view((np.void, a_cp.dtype.itemsize*a_cp.shape[1])).ravel(),
          b.view((np.void, b.dtype.itemsize*b.shape[1])).ravel())]

might work for you.

This directly uses the masked array to return the correct values from your array a.

Community
  • 1
  • 1
jotasi
  • 5,077
  • 2
  • 29
  • 51
-1

Check this, @Ernie. It may help you to get to the solution. ;D

http://docs.scipy.org/doc/numpy-1.10.1/reference/generated/numpy.in1d.html

pceccon
  • 9,379
  • 26
  • 82
  • 158