0

What I want to do

I would like to count the number of rows with conditions. Each column should have different numbers.

import numpy as np
import pandas as pd

## Sample DataFrame
data = [[1, 2], [0, 3], [np.nan, np.nan], [1, -1]]
index = ['i1', 'i2', 'i3', 'i4']
columns = ['c1', 'c2']
df = pd.DataFrame(data, index=index, columns=columns)
print(df)

## Output
#      c1   c2
# i1  1.0  2.0
# i2  0.0  3.0
# i3  NaN  NaN
# i4  1.0 -1.0

## Question 1: Count non-NaN values
## Expected result
# [3, 3]

## Question 2: Count non-zero numerical values
## Expected result
# [2, 3]

Note: Data types of results are not important. They can be list, pandas.Series, pandas.DataFrame etc. (I can convert data types anyway.)

What I have checked

## For Question 1
print(df[df['c1'].apply(lambda x: not pd.isna(x))].count())

## For Question 2
print(df[df['c1'] != 0].count())

Obviously these two print functions are only for column c1. It's easy to check one column by one column. I would like to know if there is a way to calculate counts of all columns at once.

Environment

Python 3.10.5
pandas 1.4.3

dmjy
  • 1,183
  • 3
  • 10
  • 26
  • `df.notna().sum(axis=0)` - `notna()` gives dataframe with `True/False` and `sum()` treats `True` as `1` and `False` as `0` – furas Aug 22 '22 at 11:37

2 Answers2

2

You do not iterate over your data using apply. You can achieve your results in a vectorized fashion:

print(df.notna().sum().to_list()) # [3, 3]
print((df.ne(0) & df.notna()).sum().to_list()) # [2, 3]

Note that I have assumed that "Question 2: Count non-zero values" also excluded nan values, otherwise you would get [3, 4].

ko3
  • 1,757
  • 5
  • 13
  • You beat me to it :) I'm going to delete my answer as it's almost identical, except that I did `df.fillna(0).ne(0).sum()` for the second one. – fsimonjetz Aug 22 '22 at 11:47
  • 1
    @fsimonjetz, yeah, your answer would work well too. Btw, I wouldn't say I beat you, maybe I just saw the post earlier than you :) – ko3 Aug 22 '22 at 11:50
  • @ko3 Thank you for your answer. I changed Question 2 to "non-zero numerical values". – dmjy Aug 22 '22 at 12:11
0

You was close I think ! To answer your first question :

>>> df.apply(lambda x : x.isna().sum(), axis = 0)
c1    1
c2    1
dtype: int64

You change to axis = 1 to apply this operation on each row.

To answer your second question this is from here (already answered question on SO) :


>>> df.astype(bool).sum(axis=0)
c1    3
c2    4
dtype: int64

In the same way you can change axis to 1 if you want ...

Hope it helps !

bvittrant
  • 79
  • 6