2

I want to get the rows of B where:

  1. If A[:,0] is equal to either B[:,0] or B[:,2], then A[:,1] has to be equal to B[:,1] or B[:,3] respectively
  2. A[:,0] is not equal to either B[i,0] and B[i,2]

For example:

A=np.array([[101,  1],
            [103,  3]])

B=np.array([[100,1,101,1],
            [100,1,102,1],
            [100,1,103,3],
            [100,2,101,2],
            [100,2,103,2],
            [101,1,100,3],
            [101,1,103,2],
            [101,4,100,4],
            [101,4,103,4],
            [104,5,102,3]])

R=np.array([[100,1,101,1],
            [100,1,102,1],
            [100,1,103,3],
            [101,1,100,3],
            [104,5,102,3]])

I tried the solution from here (Implementation of numpy in1d for 2D arrays?) but I get an error because I cannot use view with a slice of an array.

Thanks for the help!

Community
  • 1
  • 1
user3357979
  • 607
  • 1
  • 5
  • 12
  • So, does `case 1` and `case 2` both have to be TRUE for selection of rows from `B`. Also, within `case 1`, does the `"then"` imply both the `"A[:,0] is equal to either B[:,0] or B[:,2]"` AND `"[:,1] has to be equal to B[:,1] or B[:,3]"` have to be true again for selection of rows from B? – Divakar Sep 25 '15 at 06:48
  • @Divakar, case `1` is a *conditional statement*, whereas case `2` is simply a *statement*. Wherever case `2` is true, case `1` has no effect (the `if` in `1` is false, so the conditional is upheld). The first half of case `1` alone isn't a filter for selection (it doesn't matter whether `A[:,0]` matches one of `B[:, even]`, it only matters that _when there is a match_, then `A[:, 1]` matches the respective `B[:, odd]`). Therefore, logically case `2` can be ignored, and you get the same rows. Maybe @user3357979 can confirm? – askewchan Sep 25 '15 at 15:33
  • @askewchan that is correct – user3357979 Sep 25 '15 at 21:49
  • @user3357979, got your comment and noticed a mistake in my code; it works for your example but there was a corner case that it missed where matches in the two pairs of `A` could be crossed. Please see my updated answer. If `[103,1,101,3]` is in `B`, it should be excluded from the output (the pair `[101,3]` violates the implication), but my code accepted it because it combined the `A` pairs before checking the implication (which depends on _which_ `A` pair matches). – askewchan Sep 26 '15 at 20:37

2 Answers2

3

I would start by simplifying your rules. Ignoring the shapes for now, let us consider A and B to be lists of pairs. Then your requirement is that if the left partner of a pair matches one of the left partners in A, then the right partners must also match.

This is the definition of material implication, written as left match → right match. The nice part is that

(x → y) is true only in the case that either (x is false) or (y is true)

the latter of which is quite easy to code. For you, the left match is x = A[..., 0] == B[..., 0] and the right match is y = A[..., 1] == B[..., 1]. So to check that x → y, you just check not(x) or y, which can be written as ~x | y.

To deal with the shape, use reshaping so that the left and right are just along one axis (last axis), then broadcasting to check for matches with either pair in A, then checking that the conditional is met for all pairs in each row of B. All together this looks like this (see below for detailed explanation):

def implicate(A, B):
    # axes: (i, apair, bpair, partner)
    a = A[None, :, None, :]
    b = B.reshape(-1, 1, 2, 2)
    m = a == b
    m = ~m[...,0] | m[...,1] # require the implication rule along last axis
    m = m.all((1,2))         # both pairs in each A and B must comply (axes 1,2)
    return m, B[m]           # probably want to return only one of these

Here's how it applies to your system.

  1. to get around the shapes, just use broadcasting nicely, then check whether the above is true for all pairs in the row.

    a = A[None, :, None, :] # or A.reshape(1, A.shape[0], 1, A.shape[1]) to add two broadcasting axes
    b = B.reshape(-1, 1, 2, 2) # this is B.reshape(10, 1, 2, 2) without needing to know 10
    

    This gives each a and b four dimensions: (i, a_pair, b_pair, partner), that is, you slice in the first axis to move along i (rows in B), the second to choose which (of two) pairs in A, third to do the same for B, and the last to select which of the two partners in each pair. To generalize this (if you don't know the shape of either in advance), you could use:

    a = A[None, :, None, :] # or A.reshape(1, -1, 1, 2)
    b = B.reshape(len(B), 1, -1, 2)
    

    Where the -1s would allow any number of pairs in A and B. The 2s assume we are discussing pairs.

  2. Now we can just get an array of matches with:

    m = a == b
    

    which has shape (10, 2, 2, 2), again standing for (i, a_pair, b_pair, partner).

  3. Next we apply the requirement for the material implication, as above. To make it easier to read, we first split apart all the left partners from the right partners, then check the conditional is held. Here we have

    left  = m[...,0]
    right = m[...,1]
    m = ~left | right
    

    which eliminates the last axis partner, leaving (i, b_pair).

  4. Finally, we want to make sure the rule applies for all pairs in each row of B, which is given by the b_pair axis (2).

    m = m.all(2)
    

    and of course it must comply for matches in all pairs in A (a_pair axis is 1):

    m = m.all(1)
    

Putting it all together, and combining the any calls in the last step, you get the function above.

askewchan
  • 45,161
  • 17
  • 118
  • 134
  • Thank you, this worked perfectly after your edit! One issue i had was on the `m = m.all((1,2))`. It kept throwing a `TypeError: an integer is required`. I just split it into `m = m.all(2)` and `m = m.all(1)` – user3357979 Sep 28 '15 at 11:59
  • Maybe that capability was introduced recently. I tested with numpy 1.9.2. if the shapes for those two axes are only two (as in the example), I would recommend not doing the two calls to `all` if speed is an issue, instead something like `m = m[...,0] | m[...,1]` which is equivalent to `m.all(-1)` (the `-1` and `...` both give the last axis. Also be sure you do it in the order you have (2 then 1), since each removes an axis and will change their numbers. – askewchan Sep 28 '15 at 14:22
0

If I understood the question correctly, you can use np.in1d -

# Mask for A[:,0] is equal to either B[:,0] or B[:,2]
mask1 = (np.in1d(B[:,::2],A[:,0]).reshape(-1,2)).any(1)

# Mask for A[:,1] has to be equal to B[:,1] or B[:,3]
mask2 = (np.in1d(B[:,1::2],A[:,1]).reshape(-1,2)).any(1)

# Mask for A[:,0] is not equal to either B[i,0] and B[i,2]
mask3 = ~(np.in1d(B[:,::2],A[:,0]).reshape(-1,2)).any(1)

# Finally combine all masks as per requirements
out = B[(mask1 & mask2) | mask3]

Sample run -

In [361]: A
Out[361]: 
array([[101,   1],
       [103,   3]])

In [362]: B
Out[362]: 
array([[100,   1, 101,   1],
       [100,   1, 102,   1],
       [100,   1, 103,   3],
       [100,   2, 101,   2],
       [100,   2, 103,   2],
       [101,   1, 100,   3],
       [101,   1, 103,   2],
       [101,   4, 100,   4],
       [101,   4, 103,   4],
       [104,   5, 102,   3]])

In [363]: out
Out[363]: 
array([[100,   1, 101,   1],
       [100,   1, 102,   1],
       [100,   1, 103,   3],
       [101,   1, 100,   3],
       [101,   1, 103,   2],
       [104,   5, 102,   3]])
Divakar
  • 218,885
  • 19
  • 262
  • 358