effectively comparing two arrays of different size

Question

I have two arrays (data and final) and I would like to compare both arrays and return (out) the element in data which are not in final

data:

x        y      z
10.2    15.2    25.2
15.2    17.2    40.2
12.2    13.2    5.2
14.2    14.2    34.2
12.2    12.2    56.2
13.2    17.2    32.2
11.2    13.2    21.2

final:

x        y      z
15.2    17.2    40.2
14.2    14.2    34.2
12.2    12.2    56.2

out:

x        y      z
10.2    15.2    25.2
12.2    13.2    5.2
13.2    17.2    32.2
11.2    13.2    21.2

This is what I have done,

out = [np.column_stack(data[k]) for k in range(len(data)) if data[k] not in final]
out = np.vstack(out)

Problem

The problem I have is, I have to perform this action of getting my out about 10000 times (the example is just one out of 10000) and as such speed is my major concern.

Is there an efficient way to perform this?

@Divakar, the np.ravel_multi_index works with integers. what if my data are of type float64 — user2554925, Feb 09 '17 at 11:55

score 1 · Accepted Answer · answered Feb 09 '17 at 12:42

Here's one approach -

def remrows(a, b): # remove rows from a based on b
    ab = np.row_stack((a,b))
    sidx = np.lexsort(ab.T)
    ab_sorted = ab[sidx]
    idx = np.flatnonzero((ab_sorted[1:] == ab_sorted[:-1]).all(1))
    return np.delete(a, sidx[idx], axis=0)

If you want to account for some tolerance when comparing those floating-pt values, you might want to use np.isclose() instead of == at the idx step.

Sample run -

In [222]: a = np.random.randint(111,999,(10,3)).astype(float)/10.0

In [223]: a
Out[223]: 
array([[ 51.3,  66.3,  58.8],
       [ 24.3,  40.6,  37.8],
       [ 94.7,  28.8,  69.3],
       [ 21.8,  48.3,  57.5],
       [ 87.1,  81.9,  27.9],
       [ 14.2,  36.4,  22.2],
       [ 56.7,  58.7,  16.2],
       [ 66.2,  99.1,  62.5],
       [ 75.1,  27.8,  34.4],
       [ 59.7,  73.8,  22.3]])

In [224]: b = a[[1,3,5]]

In [225]: remrows(a, b)
Out[225]: 
array([[ 51.3,  66.3,  58.8],
       [ 94.7,  28.8,  69.3],
       [ 87.1,  81.9,  27.9],
       [ 56.7,  58.7,  16.2],
       [ 66.2,  99.1,  62.5],
       [ 75.1,  27.8,  34.4],
       [ 59.7,  73.8,  22.3]])

effectively comparing two arrays of different size

1 Answers1