0

I have a Pandas dataframe of which each column specifies an element of a datetime (year, month, day, etc.) in integer format. I want to join these separate columns into a single datetime object, which according to the documentation is a legal operation. But after one frustrating hour I haven't yet figured out how to do this. See for instance this minimal example:

import pandas as pd

df = pd.DataFrame(
    [[2011, 5, 3], [2014, 9, 13], [2022, 1, 1]],
    columns=("year", "month", "day")
)

datetime = df.apply(pd.to_datetime)

Desired result:

0 2011-05-03
1 2014-09-13
2 2022-01-01

Actual result:

                           year              ...                                        day
0 1970-01-01 00:00:00.000002011              ...              1970-01-01 00:00:00.000000003
1 1970-01-01 00:00:00.000002014              ...              1970-01-01 00:00:00.000000013
2 1970-01-01 00:00:00.000002022              ...              1970-01-01 00:00:00.000000001

Any suggestions?

MPA
  • 1,878
  • 2
  • 26
  • 51

3 Answers3

4

pd.to_datetime will properly parse dates from column names. Additionally you can have 'hour', 'minute', 'second', 'millisecond', 'microsecond', and/or 'nanosecond' columns.

pd.to_datetime(df[['year', 'month', 'day']])
0   2011-05-03
1   2014-09-13
2   2022-01-01
dtype: datetime64[ns]

The column naming is insensitive to capitalization/pluralization and named columns can be specified in any order.

df['MiNuTEs'] = 2
pd.to_datetime(df[['MiNuTEs', 'month', 'year', 'day']])

0   2011-05-03 00:02:00
1   2014-09-13 00:02:00
2   2022-01-01 00:02:00
dtype: datetime64[ns]
ALollz
  • 57,915
  • 7
  • 66
  • 89
  • Your solution seems elegant, but I get a `KeyError: ('year', 'month', 'day')` for your first example – MPA Nov 18 '19 at 16:29
  • @MPA check that you have `[[ ]]` so that you are passing the DataFrame. That error seems to indicate you only have a single bracket and are trying to do `df['year', 'month', 'day']` – ALollz Nov 18 '19 at 16:32
2

One way is joining the columns and then parsing with pd.to_datetime:

df.astype(str).apply('/'.join, axis=1).apply(pd.to_datetime)

0   2011-05-03
1   2014-09-13
2   2022-01-01
dtype: datetime64[ns]
yatu
  • 86,083
  • 12
  • 84
  • 139
2

You could do:

import pandas as pd

df = pd.DataFrame(
    [[2011, 5, 3], [2014, 9, 13], [2022, 1, 1]],
    columns=("year", "month", "day")
)

result = df.apply(lambda r: pd.Timestamp(year=r.year, month=r.month, day=r.day), axis=1)

print(result)

Output

0   2011-05-03
1   2014-09-13
2   2022-01-01
dtype: datetime64[ns]
Dani Mesejo
  • 61,499
  • 6
  • 49
  • 76