I have a pandas dataframe read from a csv file which includes a column of dates
In[1]: df = pd.read_csv(fileStats, index_col=0, parse_dates= ['start_date'])
In[2]: df.head()
Out[3]:
start_date
customer_ID
10006414 2012-10-02 08:30:00
10006486 2013-12-02 09:00:00
10006492 2013-12-02 09:00:00
10006572 2012-03-31 08:30:00
10006630 2012-01-06 10:30:00
I want to find the earliest date:
In[4]: df['start_date'].min()
Out[5]:
Timestamp('2011-01-12 08:00:00')
which looks fine except that when I open the .csv file in excel, and order the rows by start_date
, the earliest date is 2011-10-06 13:00:00
. The same problem happens with .max()
This post refers to a similar problem, but related to np.datetime64 which I am not using here. Is there an issue with .max()
and .min()
or is it to do with parsing the dates in .read_csv
?
EDIT:
Further investigation on suggestions from @DYZ shows that the Timestamps have been read or parsed incorrectly. 6/10/2011 13:00
in the .csv file has become 2014-03-03 10:00:00
in the dataframe. How to correct this?