37

I have a Pandas DataFrame that has date values stored in 2 columns in the below format:

col1: 04-APR-2018 11:04:29
col2: 2018040415203 

How could I convert this to a time stamp. Dtype of both of these columns is object.

cottontail
  • 10,268
  • 18
  • 50
  • 51
Taukheer
  • 1,091
  • 2
  • 13
  • 20

3 Answers3

41

For the first format you can simply pass to_datetime, for the latter you need to explicitly describe the date format (see the table of available directives in the python docs):

In [21]: df
Out[21]:
                   col1           col2
0  04-APR-2018 11:04:29  2018040415203

In [22]: pd.to_datetime(df.col1)
Out[22]:
0   2018-04-04 11:04:29
Name: col1, dtype: datetime64[ns]

In [23]: pd.to_datetime(df.col2, format="%Y%m%d%H%M%S")
Out[23]:
0   2018-04-04 15:20:03
Name: col2, dtype: datetime64[ns]
Andy Hayden
  • 359,921
  • 101
  • 625
  • 535
  • 2
    What I don't understand, is that this convert's it to `np.datetime64`, whereas I need to do some `pd.Timedelta` math which only works with `pd.Timestamp`. I can't find anywhere how to convert the column to `pd.Timestamp`... think I'll open a new Q – j7skov Jan 12 '23 at 14:19
  • 1
    @j7skov It produces `pd.Timestamp`s. If you're interested, I added an [answer](https://stackoverflow.com/a/75432626/19123103) that kind of shows other ways to convert each element into `pd.Timestamp` objects but ultimately all are the same. Cheers. – cottontail Feb 13 '23 at 06:48
16

You can try these as well. Try passing infer_datetime_format = True while reading the file.

if the above method fails try the following

df2 = pd.to_datetime(df.col1)

or

df2 = pd.to_datetime(df['col1'])
df2

Note the above methods will only convert the str to datetime format and return them in df2. In short df2 will have only the datetime format of str without a column name for it. If you want to retain other columns of the dataframe and want to give a header to the converted column you can try the following

df['col1_converetd'] = pd.to_datetime(df.col1)

or

df['col1_converetd'] = pd.to_datetime(df['col1'])

This is comforatble if you dont want to create a dataframe or want to refer the converted column in future together with other attributes of the dataframe.

Morten Jensen
  • 5,818
  • 3
  • 43
  • 55
Natty
  • 527
  • 5
  • 10
2

There are a few ways to convert column values into timestamps, some more efficient than others. N.B. Passing format= to to_datetime makes the conversion much, much faster (see this post). You can find all possible combination of datetime formats at https://strftime.org/.

from datetime import datetime
x = pd.to_datetime(df['col1'], format='%d-%b-%Y %H:%M:%S')
y = df['col1'].apply(pd.Timestamp)
z = df['col1'].apply(datetime.strptime, args=('%d-%b-%Y %H:%M:%S',))

but ultimately, all produce the same object (x.equals(y) and x.equals(z) returns True) that looks like:

0   2018-04-04 11:04:29
Name: col1, dtype: datetime64[ns]

If we check the individual values, they are the same (x[0] == y[0] == z[0] returns True) that looks like

Timestamp('2018-04-04 11:04:29')

If we look at the source code, pd.Timestamp is a subclass of datetime.datetime, so all are ultimately tied by datetime.datetime.

cottontail
  • 10,268
  • 18
  • 50
  • 51