Consider the data set:
(a very small dataset derived from this Kaggle Datasets, which is available under the CC BY-NC-SA 4.0 license.)
In the code below, I calculate the filters
boolean list using values of a different column each time. Then I apply that filter column on the DataFrame as a boolean index.
I get the correct set of results each time, though I never mention the column name while doing my indexing !
How is pandas applying the boolean indexing on the correct column each time ?
I know that the filters
boolean list has no meta information about which column of the DataFrame was used when constructing it.
So, am totally perplexed as to how this is happening !
import pandas as pd
df = pd.read_csv("ted_small.csv")
#1: Lets try to filter by "comments" > 500 first
filter_by = df["comments"]
filters = []
for i in filter_by:
if i > 500:
filters.append(True)
else:
filters.append(False)
print(f"Filters list is: {filters}")
df[filters]
It correctly outputs only those rows with comments > 500:
Then I change my list to be constructed based on the values of "duration".
import pandas as pd
df = pd.read_csv("ted_small.csv")
#2: Lets try to filter by "duration" > 1000 now
filter_by = df["duration"]
filters = []
for i in filter_by:
if i > 1000:
filters.append(True)
else:
filters.append(False)
df[filters]
It correctly outputs only those rows with duration > 1000 !!!
How is this magic happening ?
IF I were to do something like this:
df[df['comments'] > 500]
I do understand why I would get the correct result. It is because there is some meta information on what column the filter was derived from, as is seen using the output of:
df['comments'] > 500
0 True
1 False
2 False
3 False
4 True
5 True
6 True
7 False
8 True
9 True
Name: comments, dtype: bool
(Note the reference to "comments" in the output above)
EDIT:
Thanks to the discussion in the comments section, I understood it now ! After the filters
boolean list is created, the exact column used to create that boolean list doesnt matter. Simply, rows that have True in the list will be returned.