How to sort a Pandas Time Series containing values in 12 hour (AM/PM) format

Question

I am using pandas to process some data from csv files.

I need to sort the data in my DataFrame df by the column MEETING START TIME, just sorting the time. The date is handled by another field.

But the result I get is:

MEETING START TIME
10:30 AM
12:30 PM
2:00 PM
4:00 PM
9:15 AM
9:15 AM

Any meeting time in the morning with single-digit hours goes as the end. Do I need to do something to the date format, or the sort command?

A's the answer below points out, you cannot sort datetime when they are strings, they must be formatted as datetime objects.. — dartdog, Aug 05 '15 at 01:08
possible duplicate of [Sort Pandas Dataframe by Date](http://stackoverflow.com/questions/28161356/sort-pandas-dataframe-by-date) — LondonRob, Aug 06 '15 at 12:42

score 3 · Accepted Answer · answered Aug 05 '15 at 00:32

3

You can use pd.datetools.parse to try and convert your date column (currently a string) into a datetime object; then you should be able to sort:

df = pd.DataFrame({'MEETING START TIME': ['10:30 AM','12:30 PM', '2:00 PM', '4:00 PM', '9:15 AM', '9:15 AM']})
df['MEETING START TIME'] = df['MEETING START TIME'].map(lambda x: pd.datetools.parse(x))
df.sort('MEETING START TIME')

Out[33]: 
   MEETING START TIME
5 2015-08-05 09:15:00
4 2015-08-05 09:15:00
0 2015-08-05 10:30:00
1 2015-08-05 12:30:00
2 2015-08-05 14:00:00
3 2015-08-05 16:00:00

answered Aug 05 '15 at 00:32

maxymoo

35,286
11
92
119

would `pd.to_datetime` be quicker? – EdChum Aug 05 '15 at 07:56
ok, that works, Thanks. I need to strip out the date, which I'm doing with: df['MEETING START TIME'] = pd.DatetimeIndex(df['MEETING START TIME']).time Then I will convert it back to AM/PM as this is for signage and will be easier for people to read. – mattrweaver Aug 05 '15 at 13:54
2

@EdChum `%timeit pd.datetools.parse('10 AM') 10000 loops, best of 3: 39.7 µs per loop %timeit pd.to_datetime('10 AM') 1000 loops, best of 3: 159 µs per loop` – maxymoo Aug 05 '15 at 22:49

score 2 · Answer 2 · edited May 23 '17 at 11:45

It would be interested to see if the approach of using map with datetools.parse is as scalable as the standard approach given here, here and here.

Let's make a really big Series of string-represented dates to find out:

In [11]: import datetime as dt
In [12]: format = '%d/%m/%Y %H:%M:%S'
In [13]: def random_date():
   ....:    rand_num = np.random.uniform(2e9)
   ....:    return dt.datetime.fromtimestamp(rand_num).strftime(format)
In [14]: dates = pd.Series([random_date() for i in range(100000)])
In [15]: dates.head() # Some random dates (as strings)
Out[15]: 
0    30/11/1988 15:11:08
1    08/05/2025 10:29:02
2    05/09/2017 02:24:46
3    18/03/2016 14:55:20
4    22/04/1984 04:58:06
dtype: object

Now let's time the two approaches:

In [33]: %timeit dates.map(lambda x: pd.datetools.parse(x))
1 loops, best of 3: 6.98 s per loop

In [2]: %timeit pd.to_datetime(dates,format=format)
1 loops, best of 3: 525 ms per loop

So there we have it. The unorthodox approach of @maxymoo is much slower than the accepted approach when the Series is long!

you need to pass the format, e.g. ``In [36]: %timeit pd.to_datetime(dates,format=format) 1 loops, best of 3: 529 ms per loop``, otherwise its simply a dispatch to ``dateutil`` — Jeff, Aug 12 '15 at 00:01
@Jeff That's embarrassing. Question rewritten with the correct timings (which I've double-checked myself). The `pd.to_datetime` method is blazingly quick. — LondonRob, Aug 12 '15 at 16:52

How to sort a Pandas Time Series containing values in 12 hour (AM/PM) format

2 Answers2

Linked