2

I have a project wherein I have to remove activities from a certain array and store it in another array.

For example:

select_act = [2]
q_active = [2, 3]

The code I have so far looks like this:

for ele in select_act:
    new_q_active = numpy.delete(q_active, numpy.where(ele))
print(new_q_active)

Output: new_q_active = [3]

The objective is to delete elements in q_active if they're already in select_act. The code I have above works for the given example. But for, let's say, all activities in q_active are already in select_act,

q_active = [2, 3]
select_act = [2, 3]

The output I keep getting remains the same where it should be:

new_q_active = []

Any suggestion why I keep getting that? Any help would be appreciated! Thank you!

yatu
  • 86,083
  • 12
  • 84
  • 139
Acee
  • 109
  • 10
  • It is worth asking whether you actually care about the ordering. If not, then `set(q_active) - set(select_act)` would be an efficient way to achieve this. (It can be converted back to a list using `list()` but the ordering will be undefined.) – alani Jun 09 '20 at 11:45

2 Answers2

2

With duplicates

In general, removing while iterating is not a good idea, since you can easily skip values. One way you can do this is defining a boolean mask from the result of np.isin and use it to index q_active. Using this method you'd keep all instances of duplicate values:

select_act = np.array([2])
q_active = np.array([2, 3, 4, 2, 3])

m = np.isin(q_active, select_act, invert=True)
# array([ True, False])
q_active[m]
# array([3, 4, 3])

Without duplicates

It might also be worth mentioning np.setdiff1d, which in the case there are no duplciates and order is not important is good option:

select_act = np.array([2])
q_active = np.array([4, 2, 3])

np.setdiff1d(q_active, select_act)
# array([3, 4])

Comparison between both methods (interesting in the case we don't want to keep duplicates, otherwise the former is needed):

q_active = np.random.randint(1,20_000,10_000)
select_act = np.random.randint(1,20_000,5_000)

%%timeit
m = np.isin(q_active, select_act, invert=True)
q_active[m]
# 1.01 ms ± 14.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%%timeit
m = np.in1d(q_active, select_act, invert=True)
q_active[m]
# 1.01 ms ± 26.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit np.setdiff1d(q_active, select_act)
# 808 µs ± 7.54 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
yatu
  • 86,083
  • 12
  • 84
  • 139
1

Depending on whether you need to use numpy (which seems overkill for this task) you can achieve this by using set theory:

new_q_active = list(set(q_active).difference(set(select_act)))

Alternatively, a list comprehension would also do the trick:

new_q_active = [x for x in q_active if x not in select_act]

mdmjsh
  • 915
  • 9
  • 20