2

I have been looking for this answer in the community so far, could not have.

I have a dataframe in python 3.5.1 that contains a column with dates in string imported from a CSV file.

The dataframe looks like this

                  TimeStamp  TBD  TBD     Value  TBD
0       2016/06/08 17:19:53  NaN  NaN  0.062942  NaN
1       2016/06/08 17:19:54  NaN  NaN  0.062942  NaN
2       2016/06/08 17:19:54  NaN  NaN  0.062942  NaN

what I need is to change the TimeStamp column format to be %m/%d/%y %H:%M:%D

                  TimeStamp  TBD  TBD     Value  TBD
0       06/08/2016 17:19:53  NaN  NaN  0.062942  NaN

So far I have found some solutions that works but for string and not for series

Any help would be appreciated

Thanks

racekiller
  • 115
  • 1
  • 2
  • 13

2 Answers2

6

If you convert the column of strings to a time series, you could use the dt.strftime method:

import numpy as np
import pandas as pd
nan = np.nan
df = pd.DataFrame({'TBD': [nan, nan, nan], 'TBD.1': [nan, nan, nan], 'TBD.2': [nan, nan, nan], 'TimeStamp': ['2016/06/08 17:19:53', '2016/06/08 17:19:54', '2016/06/08 17:19:54'], 'Value': [0.062941999999999998, 0.062941999999999998, 0.062941999999999998]})
df['TimeStamp'] = pd.to_datetime(df['TimeStamp']).dt.strftime('%m/%d/%Y %H:%M:%S')
print(df)

yields

   TBD  TBD.1  TBD.2            TimeStamp     Value
0  NaN    NaN    NaN  06/08/2016 17:19:53  0.062942
1  NaN    NaN    NaN  06/08/2016 17:19:54  0.062942
2  NaN    NaN    NaN  06/08/2016 17:19:54  0.062942

Since you want to convert a column of strings to another (different) column of strings, you could also use the vectorized str.replace method:

import numpy as np
import pandas as pd
nan = np.nan
df = pd.DataFrame({'TBD': [nan, nan, nan], 'TBD.1': [nan, nan, nan], 'TBD.2': [nan, nan, nan], 'TimeStamp': ['2016/06/08 17:19:53', '2016/06/08 17:19:54', '2016/06/08 17:19:54'], 'Value': [0.062941999999999998, 0.062941999999999998, 0.062941999999999998]})
df['TimeStamp'] = df['TimeStamp'].str.replace(r'(\d+)/(\d+)/(\d+)(.*)', r'\2/\3/\1\4')
print(df)

since

In [32]: df['TimeStamp'].str.replace(r'(\d+)/(\d+)/(\d+)(.*)', r'\2/\3/\1\4')
Out[32]: 
0    06/08/2016 17:19:53
1    06/08/2016 17:19:54
2    06/08/2016 17:19:54
Name: TimeStamp, dtype: object

This uses regex to rearrange pieces of the string without first parsing the string as a date. This is faster than the first method (mainly because it skips the parsing step), but it also has the disadvantage of not checking that the date strings are valid dates.

unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677
  • Hello @unutbu this worked. yes I noticed the slowness when running that line also once it was done I converted it to datetime using pd.to_datetime and it was very slow compared with when df['TimeStamp'] was string. will this procedure be faster than applying the str function over a loop? specially for a dataframe for million of records, doing a loop, I think is not a good idea. – racekiller Jun 27 '16 at 18:41
  • Right. Avoid the Python loop if you can. Using `DatetimeIndex.strftime` is *much* faster than using a Python loop, parsing each date-string as a date and calling `strftime`. – unutbu Jun 27 '16 at 18:58
  • 1
    @unutbu You don't need the `DatetimeIndex`, `strftime` is also available from the `.dt` accessor. – joris Jun 27 '16 at 21:18
  • @joris: Ah, great. Thank you. – unutbu Jun 27 '16 at 21:21
0

For most common date and datetime formats, pandas .to_datetime function can parse them without we providing format. For example:

df.TimeStamp.apply(lambda x: pd.to_datetime(x))

And in the example given from the question,

df['TimeStamp'] = pd.to_datetime(df['TimeStamp']).dt.strftime('%m/%d/%Y %H:%M:%S')

will give us the same result.

Using .apply will be efficient if you have multiple columns.

Of course, providing the parsing format is necessary for many situations. For a full list of formats, please see https://docs.python.org/3/library/datetime.html.

Sarah
  • 1,854
  • 17
  • 18