2

I'm importing some data from a csv file in python and create a dataframe called frame5.

I have daily data with the date for each day. So far I have used the following to tranfrom the dates to days of the week.

    frame5['day_of_week']=frame5['date'].dt.dayofweek
    days = {0:'Mon',1:'Tues',2:'Weds',3:'Thurs',4:'Fri',5:'Sat',6:'Sun'}
    frame5['day_of_week'] = frame5['day_of_week'].apply(lambda x: days[x])

Then, in order to calculate the daily mean I use:

grouped_day_of_week=frame5.groupby('day_of_week') 
day_of_week_statistics=grouped_day_of_week['reward'].agg([np.mean,np.size,np.sum])

Then I want to make a plot with the daily means.

However, in the dataframe day_of_week_statistics the days appear to be sorted in alphabetical order (each row is the statistics calculated for each day of week).

How can I change the order of the days so that they appear in the right order as "Mon", "Tues","Weds","Thurs","Fri","Sat","Sun"?

Space Impact
  • 13,085
  • 23
  • 48
Andriana
  • 367
  • 3
  • 8
  • 17
  • I have a hunch that you have other dataframes named `frame1` through `frame4`? Don't do that. Make a list `frame` to store frames in. – Tomalak Nov 27 '17 at 10:38
  • Possible duplicate of [Custom sorting in pandas dataframe](https://stackoverflow.com/questions/13838405/custom-sorting-in-pandas-dataframe) – Mateen Ulhaq Nov 27 '17 at 10:41
  • OP, I've come up with a more efficient version of my previous answer. Take a look and see if it works for you. Thanks! – cs95 Nov 27 '17 at 12:27

1 Answers1

1

Pre-sort by dayofweek. Also, you can use map/replace, since apply is slow.

i = frame5['date'].dt.dayofweek.values.argsort()
frame5 = frame5.iloc[i]

frame5['day_of_week'] = frame5['day_of_week'].map(days)  # .replace(days)
When calling groupby, call it with sort=False, since groupby usually returns groups in sorted order of index. We do this since we don't want to disrupt the sorted order from earlier.

grouped_day_of_week = frame5.groupby('day_of_week', sort=False) 

An improved solution courtesy Jon Clements. This leverages the concepts from before, but does it more efficiently.

Call groupby on dt.dayofweek, followed by map on the result index -

g = frame5.groupby(frame5['date'].dt.dayofweek)['Reward'].agg(['mean', 'size', 'sum'])
g.index = g.index.map(days.get)
cs95
  • 379,657
  • 97
  • 704
  • 746