8

I have a pandas data frame which has datetimes with 2 different formats e.g.:

3/14/2019 5:15:32 AM
2019-08-03 05:15:35
2019-01-03 05:15:33
2019-01-03 05:15:33
2/28/2019 5:15:31 AM
2/27/2019 11:18:39 AM

...

I have tried various formats but get errors like ValueError: unconverted data remains: AM

I would like to get the format as 2019-02-28 and have the time removed

Stig
  • 353
  • 1
  • 3
  • 12
  • Note that the third date in the list is 1st March 2019 and NOT August. with format Y-d-m – Stig Mar 19 '19 at 17:45

5 Answers5

11

You can use pd.to_datetime().dt.strftime() to efficienty convert the entire column to a datetime object and then to a string with Pandas intelligently guessing the date formatting:

df = pd.Series('''3/14/2019 5:15:32 AM
2019-08-03 05:15:35
2019-01-03 05:15:33
2019-01-03 05:15:33
2/28/2019 5:15:31 AM
2/27/2019 11:18:39 AM'''.split('\n'), name='date', dtype=str).to_frame()

print(pd.to_datetime(df.date).dt.strftime('%Y-%m-%d'))
0    2019-03-14
1    2019-08-03
2    2019-01-03
3    2019-01-03
4    2019-02-28
5    2019-02-27
Name: date, dtype: object

If that doesn't give you what you want, you will need to identify the different kinds of formats and apply different settings when you convert them to datetime objects:

# Classify date column by format type
df['format'] = 1
df.loc[df.date.str.contains('/'), 'format'] = 2
df['new_date'] = pd.to_datetime(df.date)

# Convert to datetime with two different format settings
df.loc[df.format == 1, 'new_date'] = pd.to_datetime(df.loc[df.format == 1, 'date'], format = '%Y-%d-%m %H:%M:%S').dt.strftime('%Y-%m-%d')
df.loc[df.format == 2, 'new_date'] = pd.to_datetime(df.loc[df.format == 2, 'date'], format = '%m/%d/%Y %H:%M:%S %p').dt.strftime('%Y-%m-%d')
print(df)
                    date  format    new_date
0   3/14/2019 5:15:32 AM       2  2019-03-14
1    2019-08-03 05:15:35       1  2019-03-08
2    2019-01-03 05:15:33       1  2019-03-01
3    2019-01-03 05:15:33       1  2019-03-01
4   2/28/2019 5:15:31 AM       2  2019-02-28
5  2/27/2019 11:18:39 AM       2  2019-02-27
Nathaniel
  • 3,230
  • 11
  • 18
3

Assume that the column name in your DataFrame is DatStr.

The key to success is a proper conversion function, to be applied to each date string:

def datCnv(src):
    return pd.to_datetime(src)

Then all you should do to create a true date column is to call:

df['Dat'] = df.DatStr.apply(datCnv)

When you print the DataFrame, the result is:

                  DatStr                 Dat
0   3/14/2019 5:15:32 AM 2019-03-14 05:15:32
1    2019-08-03 05:15:35 2019-08-03 05:15:35
2    2019-01-03 05:15:33 2019-01-03 05:15:33
3    2019-01-03 05:15:33 2019-01-03 05:15:33
4   2/28/2019 5:15:31 AM 2019-02-28 05:15:31
5  2/27/2019 11:18:39 AM 2019-02-27 11:18:39

Note that to_datetime function is clever enough to recognize the actual date format used in each case.

Valdi_Bo
  • 30,023
  • 4
  • 23
  • 41
0

I had a similar issue. Luckily for me the different format occurred every other row. Therefore I could easily do a slice with .iloc. Howevery you could also slice the Series with .loc and a filter (detecting each format).

Then you can combine the rows with pd.concat. The order will be the same as for the rest of the DataFrame (if you assign it). If there are missing indices they will become NaN, if there are duplicated indices pandas will raise an error.

df["datetime"] = pd.concat([
    pd.to_datetime(df["Time"].str.slice(1).iloc[1::2], format="%y-%m-%d %H:%M:%S.%f"),
    pd.to_datetime(df["Time"].str.slice(1).iloc[::2], format="%y-%m-%d %H:%M:%S"),
])
JulianWgs
  • 961
  • 1
  • 14
  • 25
0

pandas >= 2.0: to_datetime can infer multiple datetime formats using format='infer'

Representative example:

df
                    Date
0   3/14/2019 5:15:32 AM
1    2019-08-03 05:15:35
2    2019-01-03 05:15:33
3    2019-01-03 05:15:33
4   2/28/2019 5:15:31 AM
5  2/27/2019 11:18:39 AM
6              1/05/2015
7            15 Jul 2009
8               1-Feb-15
9             12/08/2019

pd.to_datetime(df['Date'], format='mixed')

0   2019-03-14 05:15:32
1   2019-08-03 05:15:35
2   2019-01-03 05:15:33
3   2019-01-03 05:15:33
4   2019-02-28 05:15:31
5   2019-02-27 11:18:39
6   2015-01-05 00:00:00
7   2009-07-15 00:00:00
8   2015-02-01 00:00:00
9   2019-12-08 00:00:00
Name: Date, dtype: datetime64[ns]

Do note the caveat in the docs:

“mixed”, to infer the format for each element individually. This is risky, and you should probably use it along with dayfirst

In the example data above, the day of month and month is pretty clear, but this may not be the case for all data. If you do not think to_datetime can reliably infer dates, either add dayfirst=True/False depending on whether the day is always first in your mixed formats, or else find some other manual approach (or better still, fix the source of data!)

cs95
  • 379,657
  • 97
  • 704
  • 746
-1

I think is a little bit late for the answer but I discover a simplier way to do the same

df["date"] = pd.to_datetime(df["date"], format='%Y-%d-%m %H:%M:%S', errors='ignore').astype('datetime64[D]') 
df["date"] = pd.to_datetime(df["date"], format='%m/%d/%Y %H:%M:%S %p', errors='ignore').astype('datetime64[D]')
mxmlnlrcn
  • 51
  • 2
  • 9