Best way to remove rows that have more of a certain value than a given amount (numpy)

Question

I created an array of numbers of column size d, where the numbers are different combinations of set(p). Next, I want to delete the rows where there are more instances of a number than there were in p. I can do it with a for loop, here is my code:

import numpy as np
import itertools as it

from collections import Counter

p = [0,0,0,1,1,1,1,2,2,2,2,2,3,3,3,4,4,4,4,4,5,5,5]
assert (len(set(p)) != 1)
cnt = Counter(p)
pos = np.array(list(it.product(set(p), repeat=d)))
ds = []
for i in range(len(pos)):
    for j, k in cnt.items():
        if len(np.where(pos[i] == j)[0]) > k:
            ds.append(i)
pos = np.delete(pos, ds, axis=0)

I am looking for a faster way to do this. Thank you!

score 0 · Accepted Answer · answered Jun 20 '18 at 19:57

0

For positive integers, we can make use of bincount. So, we will use 2D bincount solution to get the binned counts per row in pos and compare against the binned counts for p and use that mask to select valid rows off pos, like so -

# https://stackoverflow.com/a/46256361/ @Divakar
def bincount2D_vectorized(a):    
    N = a.max()+1
    a_offs = a + np.arange(a.shape[0])[:,None]*N
    return np.bincount(a_offs.ravel(), minlength=a.shape[0]*N).reshape(-1,N)

pos_out = pos[(bincount2D_vectorized(pos)>np.bincount(p)).any(1)]

answered Jun 20 '18 at 19:57

Divakar

218,885
19
262
358

Thank you for your solution! I am still working on understanding it. Only thing is, you have the opposite output as I expected. In your code, pos_out are the rows from pos that I expect to be deleted. What do you recommend to edit to make this happen? – James Carter Jun 22 '18 at 15:35
@JamesCarter Invert the mask : `pos[~(bincount2D_vectorized(pos)>np.bincount(p)).any(1)]`. – Divakar Jun 22 '18 at 15:37

Best way to remove rows that have more of a certain value than a given amount (numpy)

1 Answers1