Counting the number of pandas.DataFrame rows for each column

Question

What I want to do

I would like to count the number of rows with conditions. Each column should have different numbers.

import numpy as np
import pandas as pd

## Sample DataFrame
data = [[1, 2], [0, 3], [np.nan, np.nan], [1, -1]]
index = ['i1', 'i2', 'i3', 'i4']
columns = ['c1', 'c2']
df = pd.DataFrame(data, index=index, columns=columns)
print(df)

## Output
#      c1   c2
# i1  1.0  2.0
# i2  0.0  3.0
# i3  NaN  NaN
# i4  1.0 -1.0

## Question 1: Count non-NaN values
## Expected result
# [3, 3]

## Question 2: Count non-zero numerical values
## Expected result
# [2, 3]

Note: Data types of results are not important. They can be list, pandas.Series, pandas.DataFrame etc. (I can convert data types anyway.)

What I have checked

## For Question 1
print(df[df['c1'].apply(lambda x: not pd.isna(x))].count())

## For Question 2
print(df[df['c1'] != 0].count())

Obviously these two print functions are only for column c1. It's easy to check one column by one column. I would like to know if there is a way to calculate counts of all columns at once.

Environment

Python 3.10.5
pandas 1.4.3

`df.notna().sum(axis=0)` - `notna()` gives dataframe with `True/False` and `sum()` treats `True` as `1` and `False` as `0` — furas, Aug 22 '22 at 11:37

score 2 · Accepted Answer · answered Aug 22 '22 at 11:41

2

You do not iterate over your data using apply. You can achieve your results in a vectorized fashion:

print(df.notna().sum().to_list()) # [3, 3]
print((df.ne(0) & df.notna()).sum().to_list()) # [2, 3]

Note that I have assumed that "Question 2: Count non-zero values" also excluded nan values, otherwise you would get [3, 4].

answered Aug 22 '22 at 11:41

ko3

1,757
5
13

You beat me to it :) I'm going to delete my answer as it's almost identical, except that I did `df.fillna(0).ne(0).sum()` for the second one. – fsimonjetz Aug 22 '22 at 11:47
1

@fsimonjetz, yeah, your answer would work well too. Btw, I wouldn't say I beat you, maybe I just saw the post earlier than you :) – ko3 Aug 22 '22 at 11:50
@ko3 Thank you for your answer. I changed Question 2 to "non-zero numerical values". – dmjy Aug 22 '22 at 12:11

bvittrant · Answer 2 · 2022-08-22T11:45:24.477

You was close I think ! To answer your first question :

>>> df.apply(lambda x : x.isna().sum(), axis = 0)
c1    1
c2    1
dtype: int64

You change to axis = 1 to apply this operation on each row.

To answer your second question this is from here (already answered question on SO) :


>>> df.astype(bool).sum(axis=0)
c1    3
c2    4
dtype: int64

In the same way you can change axis to 1 if you want ...

Hope it helps !

Counting the number of pandas.DataFrame rows for each column

What I want to do

What I have checked

Environment

2 Answers2