Remove values in 1d array contained in another array

Question

I have a project wherein I have to remove activities from a certain array and store it in another array.

For example:

select_act = [2]
q_active = [2, 3]

The code I have so far looks like this:

for ele in select_act:
    new_q_active = numpy.delete(q_active, numpy.where(ele))
print(new_q_active)

Output: new_q_active = [3]

The objective is to delete elements in q_active if they're already in select_act. The code I have above works for the given example. But for, let's say, all activities in q_active are already in select_act,

q_active = [2, 3]
select_act = [2, 3]

The output I keep getting remains the same where it should be:

new_q_active = []

Any suggestion why I keep getting that? Any help would be appreciated! Thank you!

It is worth asking whether you actually care about the ordering. If not, then `set(q_active) - set(select_act)` would be an efficient way to achieve this. (It can be converted back to a list using `list()` but the ordering will be undefined.) — alani, Jun 09 '20 at 11:45

yatu · Accepted Answer · 2020-06-09T13:06:58.750

With duplicates

In general, removing while iterating is not a good idea, since you can easily skip values. One way you can do this is defining a boolean mask from the result of np.isin and use it to index q_active. Using this method you'd keep all instances of duplicate values:

select_act = np.array([2])
q_active = np.array([2, 3, 4, 2, 3])

m = np.isin(q_active, select_act, invert=True)
# array([ True, False])
q_active[m]
# array([3, 4, 3])

Without duplicates

It might also be worth mentioning np.setdiff1d, which in the case there are no duplciates and order is not important is good option:

select_act = np.array([2])
q_active = np.array([4, 2, 3])

np.setdiff1d(q_active, select_act)
# array([3, 4])

Comparison between both methods (interesting in the case we don't want to keep duplicates, otherwise the former is needed):

q_active = np.random.randint(1,20_000,10_000)
select_act = np.random.randint(1,20_000,5_000)

%%timeit
m = np.isin(q_active, select_act, invert=True)
q_active[m]
# 1.01 ms ± 14.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%%timeit
m = np.in1d(q_active, select_act, invert=True)
q_active[m]
# 1.01 ms ± 26.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit np.setdiff1d(q_active, select_act)
# 808 µs ± 7.54 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

score 1 · Answer 2 · answered Jun 09 '20 at 11:17

Depending on whether you need to use numpy (which seems overkill for this task) you can achieve this by using set theory:

new_q_active = list(set(q_active).difference(set(select_act)))

Alternatively, a list comprehension would also do the trick:

new_q_active = [x for x in q_active if x not in select_act]

Remove values in 1d array contained in another array

2 Answers2