2

The data looks like this:

0        Thursday
1        Thursday
2        Thursday
3        Thursday
etc, etc

My code:

import pandas as pd
data_file = pd.read_csv('./data/Chicago-2016-Summary.csv')
days = data_file['day_of_week']

order = ["Monday","Tuesday","Wednesday", "Thursday", "Friday", "Saturday", "Sunday"]

sorted(days, key=lambda x: order.index(x[0]))
print(days)

This results in error:

ValueError: 'T' is not in list

I tried to sort and get this error but I have no idea what this means.

I just want to sort the data Monday-Sunday so I can do some visualizations. Any suggestions?

Brad Solomon
  • 38,521
  • 31
  • 149
  • 235
EP31121PJ
  • 95
  • 2
  • 9
  • 1
    Possible duplicate of [Sort a series by month name?](https://stackoverflow.com/questions/48042915/sort-a-series-by-month-name) – Brad Solomon Jan 29 '18 at 13:02

1 Answers1

3

You can use pandas' Categorical data type for this:

order = ["Monday","Tuesday","Wednesday", "Thursday", "Friday", "Saturday", "Sunday"] 
data_file['day_of_week'] = pd.Categorical(data_file['day_of_week'], categories=order, ordered=True)
data_file.sort_values(by='day_of_week', inplace=True)

In your example, be aware that when you specify

days = data_file['day_of_week']

you are creating a view to that column (Series) within your data_file frame. You may want to use days = data_file['day_of_week'].copy(). Or, just work within the DataFrame as is done above.

Brad Solomon
  • 38,521
  • 31
  • 149
  • 235
  • I can't do a frequency histogram of how many instances there are of each day in the data set? I'm trying to visually see which days have the most activity. – EP31121PJ Jan 29 '18 at 13:20
  • I would still like my graph to go in day order Monday-Sunday, instead of ordered by count. As it is now, it goes Monday, Tuesday, Friday, Thursday, Saturday, Sunday, Wednesday based on the value counts for each day. – EP31121PJ Jan 29 '18 at 13:41
  • `data_file['day_of_week'].value_counts().sort_index().plot.bar()`? – Brad Solomon Jan 29 '18 at 13:42