You can use df.eq
with df.sum
here. I suggest using index_col
parameter in pd.read_csv
to set index while reading the csv itself.
from io import StringIO
text = '''1526 0 1 2 1 0
782 0 1 1 1 2
7653 1 1 1 0 0
87bt 1 0 1 2 2'''
df = pd.read_csv(StringIO(text), header=None, index_col=0) #`index_col=0` sets 1st column as index
df.eq(1).sum(axis=1)
0
1526 2
782 3
7653 3
87bt 2
dtype: int64
You can use np.count_nonzero
if performance is an issue, it's significantly faster than df.eq(...).sum(...)
, timeit results here
np.count_nonzero(df.to_numpy()==1, axis=1)
# array([2, 3, 3, 2], dtype=int64)
# pd.Series(np.count_nonzero(df.to_numpy()==1, axis=1), index=df.index)
# This is almost 3X faster than `df.eq(...).sum(...)`
# For more details refer to https://stackoverflow.com/a/63103435/12416453
axis=1
means "over the column axis", pandas would also accept:
df.eq(1).sum(axis='columns')