124

I have a pandas dataframe as follows:

Symbol  Date
A       02/20/2015
A       01/15/2016
A       08/21/2015

I want to sort it by Date, but the column is just an object.

I tried to make the column a date object, but I ran into an issue where that format is not the format needed. The format needed is 2015-02-20, etc.

So now I'm trying to figure out how to have numpy convert the 'American' dates into the ISO standard, so that I can make them date objects, so that I can sort by them.

How would I convert these american dates into ISO standard, or is there a more straight forward method I'm missing within pandas?

Wolfgang
  • 3
  • 2
nicholas.reichel
  • 2,260
  • 7
  • 20
  • 28

6 Answers6

191

You can use pd.to_datetime() to convert to a datetime object. It takes a format parameter, but in your case I don't think you need it.

>>> import pandas as pd
>>> df = pd.DataFrame( {'Symbol':['A','A','A'] ,
    'Date':['02/20/2015','01/15/2016','08/21/2015']})
>>> df
         Date Symbol
0  02/20/2015      A
1  01/15/2016      A
2  08/21/2015      A
>>> df['Date'] =pd.to_datetime(df.Date)
>>> df.sort('Date') # This now sorts in date order
        Date Symbol
0 2015-02-20      A
2 2015-08-21      A
1 2016-01-15      A

For future search, you can change the sort statement:

>>> df.sort_values(by='Date') # This now sorts in date order
        Date Symbol
0 2015-02-20      A
2 2015-08-21      A
1 2016-01-15      A
Lukas
  • 2,330
  • 2
  • 22
  • 31
JAB
  • 12,401
  • 6
  • 45
  • 50
  • 1
    I also have a df['Date'].unique() before the sort, which returns a series instead of a Dataframe. This makes 02/20/2015 into 2015-02-19T18:00:00.000000000-0600 which then gets split into 2015-02-19. Is there a way to add a day? Or a more formal way to correct this? – nicholas.reichel Jan 27 '15 at 01:42
  • 1
    `df.Date.astype(np.int64)` should work for epoch time – JAB Jan 27 '15 at 01:47
  • 1
    Turns out that epoch would be wrong since its assuming times of 18:00 hours etc. I need them to be 00:00 hours. I have a way to convert to epoch if I could just get the date objects to not have a time, or the wrong time. – nicholas.reichel Jan 27 '15 at 01:55
  • for me `pd.to_datetime(df.Date)[0]` returns `Timestamp('2015-02-20 00:00:00')` – JAB Jan 27 '15 at 02:02
  • Starting new question with more formal description of issue – nicholas.reichel Jan 27 '15 at 03:02
  • It looks like there may actually be [a faster way](http://stackoverflow.com/a/31857163/2071807) of converting strings into dates. – LondonRob Aug 06 '15 at 13:39
151

sort method has been deprecated and replaced with sort_values. After converting to datetime object using df['Date']=pd.to_datetime(df['Date'])

df.sort_values(by=['Date'])

Note: to sort in-place and/or in a descending order (the most recent first):

df.sort_values(by=['Date'], inplace=True, ascending=False)
Reveille
  • 4,359
  • 3
  • 23
  • 46
11

@JAB's answer is fast and concise. But it changes the DataFrame you are trying to sort, which you may or may not want.

(Note: You almost certainly will want it, because your date columns should be dates, not strings!)

In the unlikely event that you don't want to change the dates into dates, you can also do it a different way.

First, get the index from your sorted Date column:

In [25]: pd.to_datetime(df.Date).order().index
Out[25]: Int64Index([0, 2, 1], dtype='int64')

Then use it to index your original DataFrame, leaving it untouched:

In [26]: df.ix[pd.to_datetime(df.Date).order().index]
Out[26]: 
        Date Symbol
0 2015-02-20      A
2 2015-08-21      A
1 2016-01-15      A

Magic!

Note: for Pandas versions 0.20.0 and later, use loc instead of ix, which is now deprecated.

jpp
  • 159,742
  • 34
  • 281
  • 339
LondonRob
  • 73,083
  • 37
  • 144
  • 201
7

Since pandas >= 1.0.0 we have the key argument in DataFrame.sort_values. This way we can sort the dataframe by specifying a key and without adjusting the original dataframe:

df.sort_values(by="Date", key=pd.to_datetime)
  Symbol        Date
0      A  02/20/2015
2      A  08/21/2015
1      A  01/15/2016
Erfan
  • 40,971
  • 8
  • 66
  • 78
-1

The data containing the date column can be read by using the below code:

data = pd.csv(file_path,parse_dates=[date_column])

Once the data is read by using the above line of code, the column containing the information about the date can be accessed using pd.date_time() like:

pd.date_time(data[date_column], format = '%d/%m/%y')

to change the format of date as per the requirement.

-1
data['Date'] = data['Date'].apply(pd.to_datetime) # non-null datetime64[ns]
ah bon
  • 9,293
  • 12
  • 65
  • 148
Saleh
  • 1
  • 10
    Welcome to Stack Overflow! While this code may solve the question, [including an explanation](//meta.stackexchange.com/q/114762) of how and why this solves the problem would really help to improve the quality of your post, and probably result in more up-votes. Remember that you are answering the question for readers in the future, not just the person asking now. Please [edit] your answer to add explanations and give an indication of what limitations and assumptions apply. – Yunnosch Jun 18 '21 at 10:57