28

I want to plot RESULT vs TIME based on a testresult.csv file that has following format, and I have trouble to get the TIME column's datatype defined properly.

TIME,RESULT  
03/24/2016 12:27:11 AM,2  
03/24/2016 12:28:41 AM,76  
03/24/2016 12:37:23 AM,19  
03/24/2016 12:38:44 AM,68  
03/24/2016 12:42:02 AM,44  
...

To read the csv file, this is the code I wrote: raw_df = pd.read_csv('testresult.csv', index_col=None, parse_dates=['TIME'], infer_datetime_format=True)
This code works, but it is extremely slow, and I assume that the infer_datetime_format takes time. So I tried to read in the csv by default first, and then convert the object dtype 'TIME' to datetime dtype by using to_datetime(), and I hope by defining the format, it might expedite the speed.

raw_df =  pd.read_csv('testresult.csv')
raw_df.loc['NEWTIME'] = pd.to_datetime(raw_df['TIME'], format='%m/%d%Y %-I%M%S %p')

This code complained error:

"ValueError: '-' is a bad directive in format '%m/%d%Y %-I%M%S %p'"

Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158
ju.
  • 1,016
  • 1
  • 13
  • 34

2 Answers2

41

The format you are passing is invalid. The dash between the % and the I is not supposed to be there.

df['TIME'] = pd.to_datetime(df['TIME'], format="%m/%d/%Y %I:%M:%S %p")

This will convert your TIME column to a datetime.


Alternatively, you can adjust your read_csv call to do this:

pd.read_csv('testresult.csv', parse_dates=['TIME'], 
    date_parser=lambda x: pd.to_datetime(x, format='%m/%d/%Y %I:%M:%S %p'))

Again, this uses the appropriate format with out the extra -, but it also passes in the format to the date_parser parameter instead of having pandas attempt to guess it with the infer_datetime_format parameter.

Andy
  • 49,085
  • 60
  • 166
  • 233
  • 2
    Thank you, Andy. I guess I also carelessly missed ":" between hour, minute and second, that's why even I tried %I, it still failed. The time in my table does have some cases with none zero-padding hour format "03/24/2016 1:08:19 AM", and based on Python's strftime format (see this http://strftime.org/ list), it should use '%-I'. Do you know why pandas is not consistent with this? – ju. Apr 25 '16 at 21:08
3

you can try this:

In [69]: df = pd.read_csv(fn, parse_dates=[0],
                          date_parser=lambda x: pd.to_datetime(x, format='%m/%d/%Y %I:%M:%S %p'))

In [70]: df
Out[70]:
                 TIME  RESULT
0 2016-03-24 00:27:11       2
1 2016-03-24 00:28:41      76
2 2016-03-24 00:37:23      19
3 2016-03-24 00:38:44      68
4 2016-03-24 00:42:02      44
MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419