When applying all() in a pandas dataframe it returns True although some values are false

Question

I have the following dataframe :

import pandas as pd

numbers = {'set_of_numbers': [1,2,3,4,5,6,7,8,9,10]}
df = pd.DataFrame(numbers,columns=['set_of_numbers'])

df['equal_or_lower_than_4?'] = df['set_of_numbers'].apply(lambda x: 'True' if x <= 4 else 'False')

print (df)

  set_of_numbers equal_or_lower_than_4?
0               1                   True
1               2                   True
2               3                   True
3               4                   True
4               5                  False
5               6                  False
6               7                  False
7               8                  False
8               9                  False
9              10                  False

When I try to apply the all() function on the last column it returns True although some values are False

all(df['equal_or_lower_than_4?'])

#Out[29]:

#True

In your own words, when you write `'True'`, what do you think the `'` symbols do? What happens when you try `bool('False')` at the interpreter prompt? Can you explain this behaviour? — Karl Knechtel, Apr 27 '21 at 21:49
also try to avoid lambdas - there are native methods you can use, good [mcve] though! — Umar.H, Apr 27 '21 at 21:50

Andreas · Answer 1 · 2021-04-27T22:01:11.770

you can simplify the code, by using: df['set_of_numbers'] <= 4

.

import pandas as pd

numbers = {'set_of_numbers': [1,2,3,4,5,6,7,8,9,10]}
df = pd.DataFrame(numbers,columns=['set_of_numbers'])

df['equal_or_lower_than_4?'] = df['set_of_numbers'] <= 4


Out[69]: 
0     True
1     True
2     True
3     True
4    False
5    False
6    False
7    False
8    False
9    False
Name: equal_or_lower_than_4?, dtype: bool

Why does all(df['equal_or_lower_than_4?']) return True? The reason is that all() is defined as:

The all() function returns True if all items in an iterable are true, otherwise it returns False.

If the iterable object is empty, the all() function also returns True.

A pandas.Series equals True if it is not empty. So all() simply checks: all([True]) because there is only one non-empty pd.Series provieded as a parameter. To check if each element inside that is True you can either use np.sum() or all(df['equal_or_lower_than_4?'].tolist())

import numpy as np
print(np.all(df['equal_or_lower_than_4?']))
#False

When applying all() in a pandas dataframe it returns True although some values are false

1 Answers1