How to create a Pandas column for datetime from year / month/ day / hour / minute / second?

Question

I am trying to construct a datetime column in Pandas that represents multiple columns describing the year, month, day, etc. Most of the other answers I can find on this topic involve processing data in the opposite direction (from datetime to integer hour, for instance).

df = pd.DataFrame()

df['year'] = [2019, 2019, 2019, 2019, 2019, 2019]
df['month'] = [8, 8, 8, 8, 8, 8]
df['day'] = [1, 1, 1, 1, 1, 1]
df['hour'] = [10,10,11,11,12,12]
df['minute'] = [15,45,20,40,10,50]
df['second'] = [0, 1, 5, 10, 10, 11]

Goal:

df['datetime_val'] = 
0   2019-08-01 10:15:00
1   2019-08-01 10:45:01
2   2019-08-01 11:20:05
3   2019-08-01 11:40:10
4   2019-08-01 12:10:10
5   2019-08-01 12:50:11
Name: datetime_vals, dtype: datetime64[ns]

In the example above, how could I rapidly create a datetime column representing the constituent time information? I could easily do this with .apply() and a helper function but I envision performing this operation for millions of rows. I would love something inbuilt / vectorized. Thanks!

Maybe take a peak at: https://stackoverflow.com/questions/11858472/string-concatenation-of-two-pandas-columns , especially the 3rd answer that compares different methods of column string comprehension — PeptideWitch, Aug 01 '19 at 23:34

score 8 · Answer 1 · answered Aug 01 '19 at 23:43

8

IIUC to_datetime can take dataframe , only if the columns is well named as yours

pd.to_datetime(df)
0   2019-08-01 10:15:00
1   2019-08-01 10:45:01
2   2019-08-01 11:20:05
3   2019-08-01 11:40:10
4   2019-08-01 12:10:10
5   2019-08-01 12:50:11
dtype: datetime64[ns]

answered Aug 01 '19 at 23:43

BENY

317,841
20
164
234

Doing some tests now! – PeptideWitch Aug 02 '19 at 00:10

PeptideWitch · Answer 2 · 2019-08-02T00:47:55.583

2

After reading through this comparison of string concatenation methods for pandas dataframes, it looks like you could benefit from using df.assign:

df.assign(datetime_val=[f"{str(year)}-{str(month)}-{str(day)} {str(hour)}:{str(minute)}:{str(second)}" for year, month, day, hour, minute, second in zip(df['year'], df['month'], df['day'], df['hour'], df['minute'], df['second'])])

EDIT2:

My method does not return datetime64 objects, however, as pointed out below by Andy L. In fact, method 3 becomes incredibly slow when swapping out the strings for datetime objects. However, the method 1 vs method 2 comparison is still valid.

EDIT:

Did some testing to compare the three methods presented here

edited Aug 02 '19 at 00:47

answered Aug 01 '19 at 23:45

PeptideWitch

2,239
14
30

2

@AndyL. Actually, fair call - OP stated that they wanted datetime64 as their data type while my method only returns strings. My apologies. If you edit your answer (like, add a space somewhere), I'll un-downvote it. Nevertheless, it's still slower than WeNYoBen's method. – PeptideWitch Aug 02 '19 at 00:44
I edited it. I never dispute your speed test. WeNYoBen's solution is better. That's why I upvoted him. – Andy L. Aug 02 '19 at 00:54
2

My philosophy is any solution always worth effort of the poster, so I never downvote any solution even that solution is wrong. On wrong solution, I may comment, but never downvote. I am kind of off balance in this case. I apologizes for my harsh words. I deleted those comments. Please edit yours so I can retract my downvoted on yours – Andy L. Aug 02 '19 at 00:59
2

Oh, thank you. That's kind of you. I like your philosophy, actually...and maybe in my rush to get the answer right, I became a bit arrogant too by trampling over your answer. It's useful for me to hear when I'm wrong, even if the words are a little harsh, so no need to apologise in that regard :) – PeptideWitch Aug 02 '19 at 01:05

Andy L. · Answer 3 · 2019-08-02T00:50:18.120

2

you may convert whole df to str and use agg to concat string and with format parameter of pd.to_datetime

df = df.astype(str)
pd.to_datetime(df.agg('-'.join, axis=1), format='%Y-%m-%d-%H-%M-%S')

Out[170]:
0   2019-08-01 10:15:00
1   2019-08-01 10:45:01
2   2019-08-01 11:20:05
3   2019-08-01 11:40:10
4   2019-08-01 12:10:10
5   2019-08-01 12:50:11
dtype: datetime64[ns]

edited Aug 02 '19 at 00:50

answered Aug 01 '19 at 23:47

Andy L.

24,909
4
17
29

I found this method quite slow when I tested it out. It works, but it's quite a heavy approach. – PeptideWitch Aug 02 '19 at 00:18

How to create a Pandas column for datetime from year / month/ day / hour / minute / second?

3 Answers3