0

I received a wonderful lambda function from a user a while ago.

actresses_modified['Winner_Count'] = actresses_modified.apply(lambda x: actresses_modified.Name.value_counts()[x.Name], axis=1)

The data frame to which it is applied looks like this:

    Year    Award           Winner  Name
2   1928    Best Actress    0.0     Louise Dresser
3   1928    Best Actress    1.0     Janet Gaynor
4   1928    Best Actress    0.0     Gloria Swanson
40  1929    Best Actress    0.0     Ruth Chatterton
41  1929    Best Actress    0.0     Betty Compson

The problem is I have forgotten how it works (I had to step away from this "for fun" project) and, more specifically, exactly what is going on with [x.Name].

The line actresses_modified.Name.value_counts() by itself gives me the count of all actress names in the data frame. What does [x.Name] mean in english, how does it manage to tally up all of the 1s next to each person's name in the data frame's Winner column, and return a correct tally of the total number of wins? Of equal importance, does this type of syntax have a name? My google searches turned up nada.

Any thoughts would be appreciated?

Ryan
  • 1,312
  • 3
  • 20
  • 40
  • 1
    I don't want to say for certain as I'm just now getting into pandas and numpy, but it looks like that lambda gets applied to each item in the dataframe and it calls the `value_counts` method and then gets each actress from the dataframe by their name(`x.Name`). So, unless `value_counts` saves the data, it sounds like it's doing unnecessary work every time. Does that make sense to you? I may not have any experience with it, but I'm like 95% sure that's what's happening. – Cory Madden Jul 21 '17 at 01:04

1 Answers1

2

Here, I'm not sure I made myself clear in the comment. So the apply method "Applies function along input axis of DataFrame." So let's say, for simplicity's sake, that we have a collection of Actress objects called actresses_modified and it looks like this:

   actresses_modified = [<Actress>, <Actress>, <Actress>, <Actress>]

Let's assume that this is how the Actress is defined:

class Actress:
    Name = "Some String"

So then we have our lambda function which gets applied to each actress in the collection as x. value_counts() returns "object containing counts of unique values."

So when we call value_counts() for each actress we're getting that Actress's counts value by key. Let's pretend that value_counts() returns a dict with actress names and their "count" and it looks like this:

counts = {
    'Jane Doe': 1,
    'Betty Ross': 3,
}

And we have our Actress objects with actress 1's Name is "Jane Doe", so when we call value_counts()[x.Name] we're doing counts["Jane Doe"] which would return 1.

Cory Madden
  • 5,026
  • 24
  • 37