Pandas - Trying to make a new dataframe with counts and averages

Question

Here is my dataframe:

data = {'transit_time':[1,1,2,2,3,3],
        'orig_state':['UT','UT','UT','UT','UT','UT'],
        'dest_state':['CA','CA','AZ','AZ','NY','NY'],
        'GEOID':['01','01','02','02','03','03'],
        'dest_state_fn':['California','California','Arizona','Arizona','New York','New York'],
        'dest_county_name':['county1','county1','county2','county2','county3','county3']
       }
df = pd.DataFrame(data,columns = ['transit_time','orig_state','dest_state','GEOID','dest_state_fn','dest_county_name'])

print (df)

   transit_time orig_state dest_state GEOID dest_state_fn dest_county_name
0             1         UT         CA    01    California          county1
1             1         UT         CA    01    California          county1
2             2         UT         AZ    02       Arizona          county2
3             2         UT         AZ    02       Arizona          county2
4             3         UT         NY    03      New York          county3
5             3         UT         NY    03      New York          county3

I would like to get a dataframe that groups by GEOID, dest_county_name, AVG(transit time), COUNT(*) like the image below:

Have you looked into the groupby function? – user1558604 Jul 20 '20 at 01:02 — user1558604, Jul 20 '20 at 01:02

score 5 · Accepted Answer · answered Jul 20 '20 at 01:08

5

Check with groupby + agg

newdf=df.groupby(['GEOID','dest_county_name']).agg(ave_transit_time=('transit_time','mean'),
                                                   Count=('GEOID','count')).reset_index()

  GEOID dest_county_name  ave_transit_time  Count
0    01          county1                 1      2
1    02          county2                 2      2
2    03          county3                 3      2

answered Jul 20 '20 at 01:08

BENY

317,841
20
164
234

getting error message back: "TypeError: aggregate() missing 1 required positional argument: 'arg'" – user2200270 Jul 20 '20 at 01:11
@user2200270 what is your pandas version ? – BENY Jul 20 '20 at 01:11
@YOBBEN_S pd.__version__ = '0.24.2' – user2200270 Jul 20 '20 at 01:13
@user2200270 please update your pandas :-) after that you should be fine, tuple in agg adding in recent version :-) – BENY Jul 20 '20 at 01:14
@user2200270 https://stackoverflow.com/questions/42735541/customized-float-formatting-in-a-pandas-dataframe – BENY Jul 20 '20 at 01:42

Pandas - Trying to make a new dataframe with counts and averages

1 Answers1