0

I am trying to add a new calculated field. I am trying the 2nd best answer in Adding calculated column(s) to a dataframe in pandas because it seems the best in my opinion as it is neat. Please feel free to offer better alternatives.

Either way my initial code is below:

import pandas as pd    

#https://github.com/sivabalanb/Data-Analysis-with-Pandas-and-Python/blob/master/nba.csv
dt_nba = pd.read_csv("data//nba.csv")  

#note this is just basic function.  I want to pass partitioned data like team's average salary
def GetSalaryIncrement(val):
    return val * 1.1

dt_nba["SalaryPlus10Percent"] = map(GetSalaryIncrement,dt_nba["Salary"])

dt_nba[["Name","Team","Salary","SalaryPlus10Percent"]][:5]

However, the result is not what I expected:

+----+---------------+----------------+--------------+--------------------------------+
| ID |     Name      |      Team      |    Salary    |      SalaryPlus10Percent       |
+----+---------------+----------------+--------------+--------------------------------+
|  0 | Avery Bradley | Boston Celtics | 7730337.0000 | <map object at 0x7fb819e9b7b8> |
|  1 | Jae Crowder   | Boston Celtics | 6796117.0000 | <map object at 0x7fb819e9b7b8> |
|  2 | John Holland  | Boston Celtics | nan          | <map object at 0x7fb819e9b7b8> |
|  3 | R.J. Hunter   | Boston Celtics | 1148640.0000 | <map object at 0x7fb819e9b7b8> |
|  4 | Jonas Jerebko | Boston Celtics | 5000000.0000 | <map object at 0x7fb819e9b7b8> |
+----+---------------+----------------+--------------+--------------------------------+

In particular I am interested in passing "window/aggregate data" where it should gracefully ignore Nan values.

Example in T-SQL I can do this:

-- INCREASE EACH PLAYERS SALARY BY 10% OF AVERAGE SALARY OF THE TEAM
SELECT NewSalary= Salary + (.1 * AVG(Salary) OVER (PARTITION BY Team))
FROM nba_data

I want to do that in Pandas if possible. Thank you.

Earl
  • 420
  • 5
  • 16

1 Answers1

3

I think you are looking for

dt_nba["Salary"]=dt_nba["Salary"].map(GetSalaryIncrement)

Also you can do with

GetSalaryIncrement(dt_nba["Salary"])

dt_nba["Salary"].apply(GetSalaryIncrement) 

To calculatedINCREASE EACH PLAYERS SALARY BY 10% OF AVERAGE SALARY OF THE TEAM

dt_nba['Newsa']=dt_nba.groupby('Team')['Salary'].transform('mean')*0.1+dt_nba["Salary"]
BENY
  • 317,841
  • 20
  • 164
  • 234
  • @Earl, in this case Wen_Ben's solution on the bottom will work. Your function is just adding 10% to each row, inconsiderate of the aggregate on team. – Erfan Mar 15 '19 at 00:51
  • looks like it is working if I isolate the average team data, the values of the average does not repeat. I have not done actual tally yet. However, the GetSalaryIncrement formula does not seem to work. Did i make a mistake on my function? A salary of 7730337 returns 8148487.4857. – Earl Mar 15 '19 at 00:57
  • sorry Ignore my mistake above. I miss-type the column names I was actually looking at average team salary increment. So far the figures match on the function. I shall evaluate the group by values now. – Earl Mar 15 '19 at 01:03
  • Okay, I confirm the window aggregation code works perfectly. I have checked the Boston Celtics and it gives 4181504.857 as expected and Brooklyn Nets gives 3501898.333 as expected. Pandas looks a very good library. Thanks Wen-Bee and Efran for the answer and comments. – Earl Mar 15 '19 at 01:08