0

I have this:

df = name   year.   salary.   d.   
     a      1990.     3.       5
     b      1992.     90.      1 
     c      1990.     234.     3 
     ...

I am trying to group my data frame based on year, and then get the average of the salaries in that year. Then my goal is to assign it to a new column. This is what I do:

df['averageSalaryPerYear'] = df.groupby('year')['salary'].mean()

I do get the correct results for df.groupby('year')['salary'].mean(), since when I print them, I get a column of numbers in scientific notation. However, when I assign it to df['averageSalaryPerYear'], they all turn into nan. I am not sure why this is happening as the printed values seem to be fine, although they are in scientific notation like this:

1990 1.707235e+07

1991 2.357879e+07

1992 3.098244e+07

which is year and avgOfSalary

Why is this happening? I want my new column to show the correct results of averages. ...

Thanks

Amin
  • 99
  • 5

1 Answers1

1

After groupby the length of rows are different so you can't add it as new column.

Try transform.

df['averageSalaryPerYear'] = df.groupby('year')['salary'].transform(np.mean)
Shuo
  • 1,512
  • 1
  • 3
  • 13
  • And is there anyway to save the values as a regular float instead of this scientific notation? It is probably a float under the hood, but I want it to be displayed in the columns as floats too @Shuo – Amin Mar 16 '23 at 03:37
  • See [this](https://stackoverflow.com/questions/21137150/format-suppress-scientific-notation-from-pandas-aggregation-results). – Shuo Mar 16 '23 at 03:43