-1

So I've been following a guide I got off reddit about understanding python's bracket requirements: Is it a list? Then use brackets. Is it a dict? Then use braces. Otherwise, you probably want parentheses.

However, I've come across something that the above can't explain:

df.groupby('Age')['Salary'].mean()

In this case, both Age & Salary are lists (they are both columns from a df), so why do we use parentheses for Age and brackets for Salary?

Additionally, why is there a dot before mean, but not in-between ('Age') and ['Salary']?

I realise the questions I'm asking may be fairly basic. I'm working my way through Python Essential Reference (4th ed) Developer's Library. If anyone has any sources dealing with my kind of questions it would be great to see them.

Thanks

  • ```df.groupby('Age')['Salary'].mean()``` = ```obj.method(param)[Key].method()``` – Joshua Nixon Feb 03 '20 at 15:47
  • 1
    `df` is an object... that contains a `groupby` method... that returns a `dictionary`... that contains a `'Salary'` key... that points to value of an object... that contains a `mean` method. – byxor Feb 03 '20 at 15:50
  • @byxor try to avoid answering in the comments – OneCricketeer Feb 03 '20 at 15:51
  • @cricket_007 I comment quick anwers on questions that I think will be closed. That way I can still help OP without submitting a low-effort answer that may get downvoted – byxor Feb 03 '20 at 15:52

3 Answers3

3

If you'll forgive me for answering the important question rather than the one you asked...
That's a very compact chain. Break it into separate lines and then use the Debugging view of an IDE to step through it the understand the datatypes involved.

query_method = df.groupby
query_string = 'Age'
query_return = query_method(query_string)
data = query_return['Salary']
data_mean = data.mean()

Step through in the PyCharm Debugger and you can see type for every variable.

1

There is a lot of context here that can be found in the pandas dataframe documentation.

To start off, df is an object of class pandas.DataFrame. pandas.DataFrame has a function called groupby that takes some input. In your example, the input is 'Age'. When you pass arguments to a function it looks like this:

my_function(input)

when you have more than one input, the common way to pass them is as multiple variables, like this

my_function(input1, input2, etc, ...)

pandas.DataFrame.groupby(...) returns an object that is subscriptable or sliceable. Using slice notation is like accessing an element in an list or a dict, like this

my_list = [1,2,3]
print(my_list[0]) # --> 1

my_dict = {
    "a": "apple",
    "b": "banana",
    "c": "cucumber"
}

print(my_dict["b"]) # --> banana

coming back to your specific question:

df.groupby('Age')['Salary'].mean()
df                                 # df, the name of your DataFrame variable
  .groupby('Age')                  # call the function groupby to get the frame grouped by the column 'Age'
                 ['Salary']        # access the 'Salary' element from that groupby
                           .mean() # and apply the mean() function to the 'Salary' element

So it appears that you are getting a list of all the the mean salaries by age of the employee. I hope this helps to explain

David Culbreth
  • 2,610
  • 16
  • 26
0

both Age & Salary are lists (they are both columns from a df),

They're Ranges / Columns, not lists. The group by function of a Dataframe returns an indexed object. Calling methods requires parenthesis, like print(). You can use square brackets to access indexed data (ref. dict() objects).

The period and paranthesis afterwards is another function call

why is there a dot before mean, but not in-between ('Age') and ['Salary']

Short answer is that foo.['bar'] is not valid syntax

But df.groupBy("Age").some_func() certainly could be done, depending on the available functions on that object

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
  • So, we put in parenthesis what we want to index the groupby object by (in this case age), then in squared brackets we put the data that we want to be averaged by the indexed value (in this case salary indexed for each age)? – Englishbeginner Feb 03 '20 at 16:23