1

I have the following dataframe with date information:

      import numpy as np
      import pandas as pd

      df_Date = pd.DataFrame({'Day' : ['2', '19', '22', '15', '16'], 
                              'Month' : ['2', '8', '1', '11', '10'], 
                              'Year' : ['2010', '2010', '2010', '2010', '2010'], 
                              'Hour' : ['1', '2', '3', '4', '5'],
                              'Minute' : ['59', '55', '33', '45', '5'],
                              'Second' : ['16', '17', '18', '19', '20']                       
                             })

I would like to generate, just one column, containing the complete date (Day, Month, Year, Hour, Minute and Secund).

I tried to implement the following code:

     df_Date['Day'] = df_Date['Day'].astype(int)
     df_Date['Month'] = df_Date['Month'].astype(int)
     df_Date['Year'] = df_Date['Year'].astype(int)

     df_Date['New_Column_Date'] = pd.to_datetime(df_Date.Year*10000 + df_Date.Month*100 + 
                                                 df_Date.Day, format='%Y%m%d')

This code is working. However, it is not complete, the hours minutes and second are missing.

My exit is like this:

        Day   Month     Year    Hour    Minute  Second  New_Column_Date
         2      2       2010    1         59    16          2010-02-02
         19     8       2010    2         55    17          2010-08-19
         22     1       2010    3         33    18          2010-01-22
         15     11      2010    4         45    19          2010-11-15
         16     10      2010    5          5    20          2010-10-16

I would like the output to be:

        Day   Month     Year    Hour    Minute  Second  New_Column_Date
         2      2       2010    1         59    16          2010-02-02 01:59:16
         19     8       2010    2         55    17          2010-08-19 02:55:17
         22     1       2010    3         33    18          2010-01-22 03:33:18
         15     11      2010    4         45    19          2010-11-15 04:45:19
         16     10      2010    5          5    20          2010-10-16 05:05:20
Jane Borges
  • 552
  • 5
  • 14
  • In this dataframe I have only date columns. But what if the dataframe contains other columns? what should I do? – Jane Borges Feb 21 '20 at 23:13
  • 1
    Just do `df['New Col'] = pd.to_datetime(df[['Year', 'Month', 'Day', 'Hour', 'Minute', 'Second']])` – cs95 Feb 21 '20 at 23:14
  • Perfect! Tkank you – Jane Borges Feb 21 '20 at 23:16
  • Does this answer your question? [How to combine multiple columns in a Data Frame to Pandas datetime format](https://stackoverflow.com/questions/49718863/how-to-combine-multiple-columns-in-a-data-frame-to-pandas-datetime-format) – AMC Feb 21 '20 at 23:37

1 Answers1

1

As long as your headers are named as shown in your post, you can delegate all the heavy lifting to pd.to_datetime:

pd.to_datetime(df[['Year', 'Month', 'Day', 'Hour', 'Minute', 'Second']])

0   2010-02-02 01:59:16
1   2010-08-19 02:55:17
2   2010-01-22 03:33:18
3   2010-11-15 04:45:19
4   2010-10-16 05:05:20
dtype: datetime64[ns]

The precondition is that you name your columns "Year", "Month", "Day", ... etc as shown above. The order of the columns isn't even that important. The names are extremely important.

cs95
  • 379,657
  • 97
  • 704
  • 746