Why do these two Python Pandas queries produce different outputs?

Question

I'm new to Python and was confused as to why the following pandas queries are producing two different outputs. I specifically don't understand why the first query has already filtered out the False values. I also don't understand why it's in a different structure.

import pandas as pd
cars = pd.read_csv('cars.csv', index_col = 0)

dr = cars['drives_right']

sel = [cars[dr]]

print(sel)

[     cars_per_cap        country  drives_right
US            809  United States          True
RU            200         Russia          True
MOR            70        Morocco          True
EG             45          Egypt          True]

import pandas as pd
cars = pd.read_csv('cars.csv', index_col = 0)

dr = cars['drives_right']

sel = [[dr]]

print(sel)

[[US      True
AUS    False
JPN    False
IN     False
RU      True
MOR     True
EG      True
Name: drives_right, dtype: bool]]

Why would they produce the same thing? Why do you wrap them in `[…]`? — mozway, May 23 '23 at 09:40
Please provide enough code so others can better understand or reproduce the problem. — Community, May 23 '23 at 09:43
In the first example you are applying a mask on the dataframe, which results in the rows having a _Falsy_ value in drives_right to be filtered out, see https://stackoverflow.com/questions/38802675/create-bool-mask-from-filter-results-in-pandas — Learning is a mess, May 23 '23 at 09:46

FlobiusKane · Accepted Answer · 2023-05-23T09:48:01.777

The difference in the output and structure is due to how you're constructing "sel" in each query. The first query filters the DataFrame and returns a new DataFrame, while the second query creates a nested list containing the Series object.

import pandas as pd

cars = pd.read_csv('cars.csv', index_col=0)

sel = cars[cars['drives_right']]
print(sel)

sel = cars.loc[cars['drives_right'], 'drives_right'].tolist()
print(sel)

In this code, the first query filters the cars DataFrame to select only the rows where 'drives_right' is True. It then prints the resulting filtered DataFrame.

The second query uses the loc indexer to select the values of 'drives_right' where 'drives_right' is True. It converts the selected values to a list using .tolist(). Finally, it prints the list of True values from the 'drives_right' column.

Make sure to replace 'cars.csv' with the actual file path to your CSV file containing the car data.

Thanks, I appreciate the explanation. That cleared it up. – malcolm.waters May 26 '23 at 23:21 — malcolm.waters, May 26 '23 at 23:21

Why do these two Python Pandas queries produce different outputs?

1 Answers1