1

I have a dataset in a csv file that looks like this:

teacher         student         student grade
Jon             marin           99
Jon             Rob             81
Jon             marly           90
Bon             martin          76
Bon             marie           56
Ton             Seri            43
Ton             Loku            99

I need an output that has the average of each teacher based on the grades obtained by student. Which will look like this,

teacher         student         student grade       teacher Average
Jon             marin           99                  90
Jon             Rob             81                  90
Jon             marly           90                  90
Bon             martin          76                  66
Bon             marie           56                  66
Ton             Seri            43                  71
Ton             Loku            99                  71

What is the shortest possible way to get this ?

This is my approach but it does not seem to work.

import pandas as pd
df = pd.read_csv('test.csv', names=['teacher', 'student', 'student grade','Average'])
df.groupby('Star Rating').mean()
JJ123
  • 573
  • 1
  • 4
  • 18
  • Possible duplicate of [Python Pandas : group by in group by and average?](https://stackoverflow.com/questions/30328646/python-pandas-group-by-in-group-by-and-average) – Vinícius Figueiredo Jul 05 '17 at 01:13

1 Answers1

3

Use groupby and transform to build a new column that is indexed the same as the one being grouped.

df['teacher average'] = df.groupby('teacher')['student grade'].transform('mean')
Adrien Matissart
  • 1,610
  • 15
  • 19
  • Thank you for your reply. It gives me following error `raise DataError('No numeric types to aggregate') pandas.core.base.DataError: No numeric types to aggregate` – JJ123 Jul 05 '17 at 01:55
  • And this is my code, where 'Star Rating' is the total number of star star which is a number. import pandas as pd `df = pd.read_csv('test.csv', names=['Date Collected', 'Health Care Provider', 'Specialty', 'Reviewer Name', 'Star Rating', 'How Many Stars', 'Mean Stars','Review Length', 'URL']) df['Mean Stars'] = df.groupby('Health Care Provider')['Star Rating'].transform('mean')` – JJ123 Jul 05 '17 at 01:58
  • Your "Star Rating" column contains probably strings. Convert it to int first : `df['Star Rating'] = df['Star Rating'].astype(int)` – Adrien Matissart Jul 05 '17 at 02:02
  • Sorry to bother you again, I get this error now, `ValueError: invalid literal for long() with base 10: 'Star Rating' ` – JJ123 Jul 05 '17 at 02:08
  • Are you sure your dataframe is properly formatted ? It seems that a row contains text headers. – Adrien Matissart Jul 05 '17 at 02:12
  • My bad, I didn't notice the text header. It worked now. Thank you ! – JJ123 Jul 05 '17 at 13:11