1

I am trying to count the number of True/False values in a data frame like this:

df = pd.DataFrame({'a': [True, False, True],
                  'b': [True, True, True],
                  'c': [False, False, True]})
count_cols = ['a', 'b', 'c']
df['count'] = df[df[count_cols] == True].count(axis=1)

enter image description here

This is working fine on this example. But when I test it on my actual df (shape - (25168, 303)), I am getting the following error:

I Understood from - What does `ValueError: cannot reindex from a duplicate axis` mean? - that this usually occurs when there are duplicate values in the index and I have tried both df.reindex() and df[~df.index.duplicated()], but I am still getting the same error message.

rpanai
  • 12,515
  • 2
  • 42
  • 64
Maverick
  • 789
  • 4
  • 24
  • 45

1 Answers1

1

Filter columns by list and count Trues values by sum - Trues are processing like 1s:

df['count'] = df[count_cols].sum(axis=1)
print (df)
       a     b      c  count
0   True  True  False      2
1  False  True  False      1
2   True  True   True      3

EDIT: For avoid error one possible solution is convert values to numpy array:

df['count'] = np.sum(df[count_cols].values, axis=1)
print (df)
       a     b      c  count
0   True  True  False      2
1  False  True  False      1
2   True  True   True      3
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252