1

I am referring to this article:

https://kanoki.org/2019/07/04/pandas-difference-between-two-dataframes/

I don't understand this particular syntax for loc, where a lambda is doing the row filtering?

df = df1.merge(df2, how = 'outer' ,indicator=True).loc[lambda x : x['_merge']=='left_only']

What is this lambda doing, I know the end result - just trying to understand the use of lambdas in "loc" syntax.

smackenzie
  • 2,880
  • 7
  • 46
  • 99
  • 2
    It would make more sense to do it [this](https://stackoverflow.com/a/61044498/9081267) way and it's also more readable because of `query`, I don't like the use of `lambda` for filtering. – Erfan Jul 27 '20 at 14:42

1 Answers1

3

loc accepts (among other things) a one-argument callable that is called on each row. The callable is expected to return something that can be used as an index (in this case, a boolean).

Effectively, this syntax means "for each row x in the merged dataframes, call the lambda on the row and select it if x['_merge'] == 'left_only'".

GPhilo
  • 18,519
  • 9
  • 63
  • 89
  • Is there possible to filter result of merge without lambda, and with chaining methods as in example? – ipj Jul 27 '20 at 14:47
  • Yes. As @Erfan pointed out in the comment to the question, you can use `.query('_merge == "left_only"')` to obtain the same result. – GPhilo Jul 27 '20 at 14:50