0

hope to get your expert help, I did check on forums first. i am a beginner using repl.it (Python 3.8.2), and the first time use pandas. thanks in advance.

Start_Date_Time column could not be sorted correctly by Python from old to new (even though it can be manually correctly sorted in csv file). The Python output shows it is sorted by the first number of the cell (Australian format: dd/mm/yyyy)

import pandas as pd
csv1 = pd.read_csv("scheduleList.csv")
sort_by_date = csv1.sort_values("Start_Date_Time")
print (sort_by_date)

Below is simplified csv data. I don't want to split the column of date and time, because want to later compare its value with the today's date and time (calculate the difference)

Event   Start_Date_Time       Duration (mins)
A         1/06/2020 7:00            60
B         1/06/2020 12:30           60
C         2/06/2020 18:30           120
D         2/06/2020 16:00           45
E         5/06/2020 17:30           60
F        12/06/2020 12:30           60
G         7/06/2020 13:00           60
H         4/06/2020 18:00           60
I         6/06/2020 11:30           60
J         6/06/2020 8:00            180
K        11/06/2020 12:30           60
L        28/06/2020 11:00           300
M        18/07/2020 19:30           120
N        20/06/2020 9:00            60
O        31/05/2020 10:00           60
P         7/06/2020 10:00           60
Q        14/06/2020 10:00           60
Tom Ron
  • 5,906
  • 3
  • 22
  • 38
Bleu
  • 11
  • 2
  • Start_Date_Time seems to be a string column, u might need to convert to datetime then try sorting on it https://stackoverflow.com/questions/28161356/sort-pandas-dataframe-by-date. – sushanth Jun 01 '20 at 06:48
  • Try `pdf.read_csv("schedulelist.csv", parse_dates=['Start_Date_Time'])` – Tom Ron Jun 01 '20 at 06:49
  • Thank you @Sushanth, your suggestion is correct, i wish i could understand the one in the link, sorry a bit too complicated for me at the moment. – Bleu Jun 01 '20 at 16:45

1 Answers1

0

Try the following code. I have added a line to parse Start_Date_Time as a DateTime column and also set ascending parameter of the sort_values method to False.

import pandas as pd
csv1 = pd.read_csv("scheduleList.csv")
csv1['Start_Date_Time'] = pd.to_datetime(csv1['Start_Date_Time']) # change over here
sort_by_date = csv1.sort_values("Start_Date_Time",ascending=False) # change over here
print (sort_by_date)

oreopot
  • 3,392
  • 2
  • 19
  • 28
  • Thank you so much for your help. It worked....i also add format into the pd.to_datetime part. Thank you. – Bleu Jun 01 '20 at 16:42