Major vote by column?

Question

I have a 20x20 2D array, from which I want to get for every column the value with the highest count of occurring (excluding zeros) i.e. the value that receives the major vote.

I can do that for a single column like this :

 : np.unique(p[:,0][p[:,0] != 0],return_counts=True)
 : (array([ 3, 21], dtype=int16), array([1, 3]))

 : nums, cnts = np.unique(p[:,0][ p[:,0] != 0 ],return_counts=True)
 : nums[cnts.argmax()]
 : 21

Just for completeness, we can extend the earlier proposed method to a loop-based solution for 2D arrays -

# p is 2D input array
for i in range(p.shape[1]):
    nums, cnts = np.unique(p[:,i][ p[:,i] != 0 ],return_counts=True)
    output_per_col = nums[cnts.argmax()]

How do I do that for all columns w/o using for loop ?

Please follow the posting guidelines in the help documentation, as suggested when you created this account. [On topic](https://stackoverflow.com/help/on-topic), [how to ask](https://stackoverflow.com/help/how-to-ask), and ... [the perfect question](https://codeblog.jonskeet.uk/2010/08/29/writing-the-perfect-question/) apply here. StackOverflow is not a design, coding, research, or tutorial resource. However, if you follow whatever resources you find on line, make an honest solution attempt, and run into a problem, you'd have a good example to post. — Prune, Sep 25 '19 at 16:49
If you search in your browser for "numpy vector usage", you'll find references that can explain this much better than we can manage here. — Prune, Sep 25 '19 at 16:49

Divakar · Answer 1 · 2019-09-25T18:18:05.843

We can use bincount2D_vectorized to get binned counts per col, where the bins would be each integer. Then, simply slice out from the second count onwards (as the first count would be for 0) and get argmax, add 1 (to compensate for the slicing). That's our desired output.

Hence, the solution shown as a sample case run -

In [116]: p # input array
Out[116]: 
array([[4, 3, 4, 1, 1, 0, 2, 0],
       [4, 0, 0, 0, 0, 0, 4, 0],
       [3, 1, 3, 4, 3, 1, 4, 3],
       [4, 4, 3, 3, 1, 1, 3, 2],
       [3, 0, 3, 0, 4, 4, 4, 0],
       [3, 0, 0, 3, 2, 0, 1, 4],
       [4, 0, 3, 1, 3, 3, 2, 0],
       [3, 3, 0, 0, 2, 1, 3, 1],
       [2, 4, 0, 0, 2, 3, 4, 2],
       [0, 2, 4, 2, 0, 2, 2, 4]])

In [117]: bincount2D_vectorized(p.T)[:,1:].argmax(1)+1
Out[117]: array([3, 3, 3, 1, 2, 1, 4, 2])

That transpose is needed because bincount2D_vectorized gets us 2D bincounts per row. Thus, for an alternative problem of getting ranks per row, simply skip that transpose.

Also, feel free to explore other options in that linked Q&A to get 2D-bincounts.

Major vote by column?

1 Answers1