1

When I sort on month_date in dataframe (df) which contains

[2014-06-01, 2014-07-01, 2014-08-01,2014-09-01,2014-10-01] I get the following:

result = df.sort(['month_date'], ascending=True) 

However the result is (in this order):

10, 6, 7, 8, 9,

Whereas I expected 6, 7, 8, 9, 10

Maybe this is because I didn't specify that the month_date column should be a datetime object (and contain only the date and no time component.

How do I make the month_date column in my dataframe into a datetime object which only contains the date which pandas understands so it is sorted in the following order: 6, 7, 8, 9, 10,

UPDATE - YS-L correct answer below

df = pd.DataFrame({'month_date': ['2014-06-01', '2014-01-01', '2014-08-01','2014-09-01','2014-10-01']})
df['month_date'] = pd.to_datetime(df['month_date'])
print df.sort(['month_date'], ascending=True)
yoshiserry
  • 20,175
  • 35
  • 77
  • 104

1 Answers1

1

You can use pandas.to_datetime to convert the column into datetime type, and then perform sorting:

df = pd.DataFrame({'month_date': ['2014-06-01', '2014-01-01', '2014-08-01','2014-09-01','2014-10-01']})
df['month_date'] = pd.to_datetime(df['month_date'])
print df.sort(['month_date'], ascending=True)

Output:

           month_date
1 2014-01-01 00:00:00
0 2014-06-01 00:00:00
2 2014-08-01 00:00:00
3 2014-09-01 00:00:00
4 2014-10-01 00:00:00
YS-L
  • 14,358
  • 3
  • 47
  • 58
  • YS-L so this would replace the existing month_date column rather than creating another one, with the dates in date format as a date-time obect? – yoshiserry Dec 02 '14 at 04:46
  • Yes, the original column will be replaced, due to the second line of code above. You can check the dtypes of the resulting DataFrame to be sure. – YS-L Dec 02 '14 at 04:48
  • YS-L --> Great! how would I check the datatypes of the columns? – yoshiserry Dec 02 '14 at 04:49
  • Use `df.dtypes` to check the data types. To do a nested sort, just pass a list of columm names to the ``sort`` function [(see doc)](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.sort.html). – YS-L Dec 02 '14 at 04:52
  • Thanks! YS-L When dtypes returns object, does that mean it is a text string? how do we know what "object" is? – yoshiserry Dec 02 '14 at 04:55
  • You can find more information [here](http://pandas.pydata.org/pandas-docs/stable/basics.html#dtypes). Generally, anything else other than `float`, `int`, `bool`, `datetime64[ns]` and `timedelta[ns]` will be regarded as `object`. You need to access the individual element to find out the actual type. – YS-L Dec 02 '14 at 04:59
  • ah I see so objects may contain more than one data type depending on the data (at the individual elements they are holding). :) Thanks! – yoshiserry Dec 02 '14 at 05:03