0

I'd like to create a dictionary that contains lambda functions for the convenient filtering of pandas data frames. When I instantiate each dictionary item line by line, I get the behaviour I want. But when I use a for loop, the filters use the last value of n. Does the lambda function reference the global variable n, and not its value at the time of instantiation? Is my understanding of lambda functions off?

Note, this example is watered down. In my actual project, I use a DateTime index, and the dictionary will have integer keys that filter by year, eg. df.index.year == 2020, and some string keys that filter by week/weekend, time of day, etc.

import pandas as pd

data = [[1,2],[3,4],[5,6]] # example df

df = pd.DataFrame(index=range(len(data)), data=data)

filts = {}
filts[1] = lambda df: df[df.index == 1]  # making a filter dictionary
filts[2] = lambda df: df[df.index == 2] # of lamda funcs

print(filts[1](df)) # works as expected
print(filts[2](df))

filts = {}
for n in range(len(data)):
    filts[n] = lambda df: df[df.index == n] # also tried wrapping n in int
# n = 0  # changes behaviour
print(filts[0](df))  # print out the results for n = 2
print(filts[1](df))  # same problem as above

# futher investigating lambdas
filts = {}
n = 0
filts[n] = lambda df: df[df.index == n]  # making a filter dictionary
n = 1
filts[n] = lambda df: df[df.index == n] # of lamda funcs
print(filts[0](df))  # print out the results for n = 1

likethevegetable
  • 264
  • 1
  • 4
  • 17

1 Answers1

1

I'm not sure about the duplicate-ness of the question, and I am answering it because I've run into this using pandas myself. You can solve your problem using closures.

Change your loop as follows:

for n in range(len(data)):
    filts[n] = (lambda n: lambda df: df[df.index == n])(n)

What's wrong with OP's approach?

Lambdas maintain a reference to the variable. So n here is a reference to the variable that is being iterated over in the loop. When you evaluate your lambdas, the reference to n (in all the defined lambdas in your filts is assigned to the final value assigned to the reference n in the loop. Hence, what you're seeing is expected. The takeaway- "The lambda's closure holds a reference to the variable being used, not its value, so if the value of the variable later changes, the value in the closure also changes." source.

DaveIdito
  • 1,546
  • 14
  • 31
  • Thanks. The first commenter (ie. you) on the OP noted a duplicate question which solved my problem as well. Another approach is `lambda df, n=n: df[df.index == n] ` which I think is syntactically easier to read. Cheers! – likethevegetable Oct 10 '20 at 13:55
  • 1
    I was the commenter ;) Glad it helped. – DaveIdito Oct 10 '20 at 13:55