0

It should be a standard question but I am not able find the answer :(

I have a numpy darray n samples (raw) and p variables (observation). I would like to count how many times each variables is non 0.

I would use a function like

sum([1 for i in column if i!=0])

but how can I apply this function to all the columns of my matrix?

Donbeo
  • 17,067
  • 37
  • 114
  • 188
  • Just a tip - `sum` supports generator expressions. So, you can just do this: `sum(1 for i in column if i!=0)`. Actually, if you only have integers, you can do this `sum(1 for i in column if i)`, since `0` evaluates to `False`. –  Nov 28 '13 at 15:06

2 Answers2

2

from this post: How to apply numpy.linalg.norm to each row of a matrix?

If the operation supports axis, use the axis parameter, it's usually faster,

Otherwise, np.apply_along_axis could help.

Here is the numpy.count_nonzero.

So here is the simple answer:

import numpy as np

arr = np.eye(3)
np.apply_along_axis(np.count_nonzero, 0, arr)
Community
  • 1
  • 1
gongzhitaao
  • 6,566
  • 3
  • 36
  • 44
  • This is good for my problem. Does it works also with lambda function? – Donbeo Nov 28 '13 at 17:51
  • @Donbeo I think so, any function could be used. Maybe have a try. – gongzhitaao Nov 28 '13 at 19:11
  • Some functions such as `np.nanquantile()` is much faster using `np.apply_along_axis()` rather than passing the axis parameter to `np.nanquantile()`. Other functions may depend. If speed is an issue, it may pay to profile and compare. – ChaimG Apr 05 '22 at 04:56
1

You can use np.sum over a boolean array created from comparing your original array to zero, using the axis keyword argument to indicate whether you want to count over rows or columns. In your case:

>>> a = np.array([[0, 1, 1, 0],[1, 1, 0, 0]])
>>> a
array([[0, 1, 1, 0],
       [1, 1, 0, 0]])
>>> np.sum(a != 0, axis=0)
array([1, 2, 1, 0])
Jaime
  • 65,696
  • 17
  • 124
  • 159