0

I'm new to Python and was confused as to why the following pandas queries are producing two different outputs. I specifically don't understand why the first query has already filtered out the False values. I also don't understand why it's in a different structure.

import pandas as pd
cars = pd.read_csv('cars.csv', index_col = 0)

dr = cars['drives_right']

sel = [cars[dr]]

print(sel)

[     cars_per_cap        country  drives_right
US            809  United States          True
RU            200         Russia          True
MOR            70        Morocco          True
EG             45          Egypt          True]
import pandas as pd
cars = pd.read_csv('cars.csv', index_col = 0)

dr = cars['drives_right']

sel = [[dr]]

print(sel)

[[US      True
AUS    False
JPN    False
IN     False
RU      True
MOR     True
EG      True
Name: drives_right, dtype: bool]]

1 Answers1

0

The difference in the output and structure is due to how you're constructing "sel" in each query. The first query filters the DataFrame and returns a new DataFrame, while the second query creates a nested list containing the Series object.

import pandas as pd

cars = pd.read_csv('cars.csv', index_col=0)

sel = cars[cars['drives_right']]
print(sel)

sel = cars.loc[cars['drives_right'], 'drives_right'].tolist()
print(sel)

In this code, the first query filters the cars DataFrame to select only the rows where 'drives_right' is True. It then prints the resulting filtered DataFrame.

The second query uses the loc indexer to select the values of 'drives_right' where 'drives_right' is True. It converts the selected values to a list using .tolist(). Finally, it prints the list of True values from the 'drives_right' column.

Make sure to replace 'cars.csv' with the actual file path to your CSV file containing the car data.