1

I have a numpy matrix A, for example

3 4 3 4
2 1 3 5 
3 2 1 1
1 1 1 1

I want to count occurances PER ROW, knowing the data in each cell comes from a given "possible votes" array, for example [0, 1, 2, 3, 4, 5, 6],

In this case I would like an output which has the same number of rows, and a column for every possible "vote", meaning something like

(0) (1) (2) (3) (4) (5) (6)
 0   0   0   2   2   0   0
 0   1   1   1   0   1   0
 0   2   1   1   0   0   0
 0   4   0   0   0   0   0

How can this be accomplished with numpy? Runtime is important.

This smells like np.bincount, but I can't figure out how to efficiently generalize to a higher dimension.

Gulzar
  • 23,452
  • 27
  • 113
  • 201
  • have you tried `np.histogram`? – prhmma Nov 30 '19 at 10:31
  • https://stackoverflow.com/questions/40018125/binning-of-data-along-one-axis-in-numpy suggests that, but seems inefficient, since this is actually a for loop, isnt it? – Gulzar Nov 30 '19 at 10:38
  • I think np.histogram always flattens the data... you could pair it up with np.apply_along_axis (see https://stackoverflow.com/questions/40018125/binning-of-data-along-one-axis-in-numpy). But I think even then you are not getting a lot of performance gain over just looping over the rows. – Chris Nov 30 '19 at 10:42
  • Another hint if you are looking for efficiency: Make sure that the numpy array you are using has the right layout (C-order vs. Fortran-order) to optimize for quick access of rows/colums. – Chris Nov 30 '19 at 10:46
  • `np.sum(np.eye(7)[A], axis=1)` – warped Nov 30 '19 at 10:53

0 Answers0