I am having a dataframe with a column named 'Formatted Date', by default the column's dtype is'object'.
I am trying to Convert it to datetime replacing all '-' separators with '/' and sorting the column by date, this is the raw format:
0 2006-04-01 00:00:00.000 +0200
1 2006-04-01 01:00:00.000 +0200
2 2006-04-01 02:00:00.000 +0200
3 2006-04-01 03:00:00.000 +0200
4 2006-04-01 04:00:00.000 +0200
when I run below code everything seems good and all - separators will be replaced with /
df= pd.read_csv (r"df.csv")
df['Formatted Date'] = df['Formatted Date'].apply(lambda x: x.replace( '-' , '/'))
out:
0 2006/04/01 00:00:00.000 +0200
1 2006/04/01 01:00:00.000 +0200
2 2006/04/01 02:00:00.000 +0200
3 2006/04/01 03:00:00.000 +0200
4 2006/04/01 04:00:00.000 +0200
BUT when I try to convert it to datetime using below code all separators will get back to -
df['Formatted Date'] = pd.to_datetime((df['Formatted Date']), format = '%Y/%m/%d %H:%M:%S', utc=True)
and the output will be:
0 2006-03-31 22:00:00+00:00
1 2006-03-31 23:00:00+00:00
2 2006-04-01 00:00:00+00:00
3 2006-04-01 01:00:00+00:00
4 2006-04-01 02:00:00+00:00
AND for sorting I am using df.sort_values(by=['Formatted Date'])
which is not working( It's not sorting the values, does nothing...)
EDIT Thank you ljdyer for the Answer and Yes, I am using jupyter notebook.
df.style.format
code is not changing the separators in my case and I don't know why
df.style.format({'Formatted Date': lambda t: t.strftime("'%Y/%m/%d %H:%M:%S")})
df
Out:
0 2006-04-01 00:00:00.000 +0200
1 2006-04-01 01:00:00.000 +0200
2 2006-04-01 02:00:00.000 +0200
3 2006-04-01 03:00:00.000 +0200
4 2006-04-01 04:00:00.000 +0200
Using:
df['Formatted Date'] = pd.to_datetime((df['Formatted Date']), format = '%Y/%m/%d %H:%M:%S', utc=True)
df['Formatted Date'] = df['Formatted Date'].dt.strftime('%Y/%m/%d %H:%M:%S')
Does change the separators but the dtype remains object
,as you mentioned. is there any way to keep the column format as datetime64
and change the separators from -
to /
?
and about sorting
, this code:
df.sort_values(by=['Formatted Date'])
is not working because the first date in dataset is:
2006-01-01 00:00:00.000 +0100
not 2006-03-31 22:00:00+00:00
and one more thing I noticed is that when I open dataset in notepad++ the datetime is showing like:
2006-01-01 22:00:00.000 +0100
2006-01-01 23:00:00.000 +0100
2006-01-10 00:00:00.000 +0100
2006-01-10 01:00:00.000 +0100
Instead of:
2006-01-01 22:00:00.000 +0100
2006-01-01 23:00:00.000 +0100
2006-01-02 00:00:00.000 +0100
2006-01-02 01:00:00.000 +0100
by the way English isn't my native language I hope the explanations are clear