Pandas .filter() method with lambda function

Question

I'm trying to understand the .filter() method in Pandas. I'm not sure why the below code doesn't work:

# Load data
from sklearn.datasets import load_iris
import pandas as pd
data = load_iris()
df = pd.DataFrame(data.data, columns=data.feature_names)

# Set arbitrary index (is this needed?) and try filtering:
indexed_df = df.copy().set_index('sepal width (cm)')
test = indexed_df.filter(lambda x: x['petal length (cm)'] > 1.4)

I get:

TypeError: 'function' object is not iterable

I appreciate there are simpler ways to do this (e.g. Boolean indexing) but I'm trying to understand for learning purposes why filter fails here when it works for a groupby as shown below:

This works:

 filtered_df = df.groupby('petal width (cm)').filter(lambda x: x['sepal width (cm)'].sum() > 50)

The documentation where you link to has four arguments: `items`, `like`, `regex` and `axis`. None of the (if you read the documentation) accepts a function/lambda expression. — Willem Van Onsem, Jan 17 '18 at 15:44
`filter` is for selecting columns based on partial matches and regex matches on the column names. — cs95, Jan 17 '18 at 15:44
Thank you Willem (and others). I can happily do via Boolean indexing - the sole reason I asked is that it was an example from a DataCamp course, albeit using `groupby` and then `filter` with a `lambda` function. This part is still unclear to me as it works with a `groupby` - I will edit the question to make this explicit. — User123456789, Jan 17 '18 at 16:05
To be clear, this is not an exact duplicate of a Boolean indexing question, it's about why `filter` works with a `groupby` and not without. — User123456789, Jan 17 '18 at 16:21
@maw501 `DataFrame.filter` and `groupby.filter` are very different methods. Yes it is unfortunate that they have the same name but that's the only thing in common. You shouldn't compare them. — ayhan, Jan 18 '18 at 21:14
Goodness. I hadn't realised there was a `groupby.filter` - thanks! Maybe make that the answer? Thank you again. — User123456789, Jan 18 '18 at 21:16
NOT A DUPLICATE... Is there a way to filter a DataFrame using a lambda? — Alex R, Nov 25 '20 at 04:45

score 0 · Answer 1 · answered Jan 17 '18 at 15:45

You can use the condition indexed_df['petal length (cm)'] > 1.4 (here we use indexed_df, not x) as a way to filter the dataframe, so:

indexed_df[indexed_df['petal length (cm)'] > 1.4]

How does this work?

If you perform indexed_df['petal length (cm)'] you obtain the "column" of the dataframe: some sort of sequence where for every index, we get the value of that column. By performing a column > 1.4, we obtain some sort of column of booleans: True if the condition is met for a certain row, and False otherwise.

We then can use such boolean column as an element for the dataframe indexed_df[boolean_column] to obtain only the rows where the corresponding row of the boolean_column is True.

Thanks but as stated above this doesn't clear up why the lambda function works when using with `groupby` as now included in the edited answer. — User123456789, Jan 18 '18 at 21:09

Pandas .filter() method with lambda function

1 Answers1