0

I have a table, that looks like this:

date                     id
0   11:09:27 Nov. 26 2020   94857
1   10:49:26 Okt. 26 2020   94853
2   10:48:24 Sept. 26 2020  94852
3   9:26:33 Aug. 26 2020    94856
4   9:26:33 Jul. 26 2020    94851
5   9:24:38 Dez. 26 2020    94850
6   9:24:38 Jan. 26 2020    94849
7   9:09:08 Jun. 27 2019    32148
8   9:02:41 Mai 27 2019 32145
9   9:02:19 Apr. 27 2019    32144
10  9:02:05 Mrz. 27 2019    32143
11  9:02:05 Feb. 27 2019    32140

(initial table)

the date column format now is 'object', I'm trying to change it to 'datetime' using

df['date'] = pd.to_datetime(df['date'], format ='HH:MM:SS-%mm-%dd-%YYYY', errors='coerce')

and receive only NaT as a result. The problem is that the names of the months here are not standart. For example, Mai comes without a dot in the end. What's the best way to convert its format?

Tallie
  • 3
  • 3
  • Your format string looks off. –  Nov 27 '20 at 14:37
  • Try `format ='%H:%M:%S-%m-%d-%Y'` –  Nov 27 '20 at 14:39
  • this one also returns NaT – Tallie Nov 27 '20 at 14:41
  • That's because your months are abbreviated so you need `%b` instead of `%m`. However, you have some weird 4-char abbreviations there. You may need to write your own converter. –  Nov 27 '20 at 14:43
  • 1
    Post a sample of your dataset that we can copy--paste if you want more help. –  Nov 27 '20 at 14:44
  • Try `pd.to_datetime(df['date'], infer_datetime_format=True)` –  Nov 27 '20 at 15:24
  • Thank you I edited the question. pd.to_datetime(df['date'], infer_datetime_format=True) returns "Unknown string format: 20:26:12 Okt. 31 2020" Maybe it can't read some months's names?.. – Tallie Nov 27 '20 at 15:36

3 Answers3

0

The following format works for most of your data:

format="%H:%M:%S %b. %d %Y"

H stands for Hours, M for minutes, S for seconds, b for abbreviated months, and Y for year.

As said by Justin in the comments, your month abbreviations are off. These four characters abbreviations are unconventional, you should format your string to remove the last character of the month if it is 4 characters long. If it is 3 characters long then leave it like it is.

EDIT:

Note that in your dataset, the abbreviations are ended by a ".", hence the dot in the string format.

robinood
  • 1,138
  • 8
  • 16
  • This failed because of 'Sept.' does not match %b. - Strange, yet it works by default is date precedes time??? – frankr6591 Nov 27 '20 at 15:12
0

Your date column has a complicated format, so just change the format of your pd.to_datetime function:

# 11:09:27 Nov.26 2020 ---> '%I:%M:%S %b.%d %Y'
df['date'] = pd.to_datetime(df['date'], format ='%I:%M:%S %b. %d %Y', errors='coerce')

 
 output: 2020-11-26 11:09:27
ljuk
  • 701
  • 3
  • 12
0

This works for me... even with inconsistency

pd.to_datetime(df.date.str[9:]+' '+df.date.str[0:8])

Input (random generated dates, changed 7 to have Sept.)

                     date
0   19:06:04 Mar. 19 2020
1   17:27:11 Mar. 05 2020
2   07:17:04 May. 05 2020
3   04:53:50 Sep. 23 2020
4   03:43:20 Jun. 23 2020
5   17:35:00 Mar. 06 2020
6   06:04:48 Jan. 15 2020
7  12:26:14 Sept. 18 2020
8   03:21:10 Jun. 03 2020
9   17:37:00 Aug. 26 2020

output

0   2020-03-19 19:06:04
1   2020-03-05 17:27:11
2   2020-05-05 07:17:04
3   2020-09-23 04:53:50
4   2020-06-23 03:43:20
5   2020-03-06 17:35:00
6   2020-01-15 06:04:48
7   2020-09-18 12:26:14
8   2020-06-03 03:21:10
9   2020-08-26 17:37:00

The following works for all months MMM, but fails due to 'Sept.' month. Strange because if date precedes time, by default it parses it correctly (ie. coerce code seems to work when precedes) ???

pd.to_datetime(df['date'].astype(str), format ="%H:%M:%S %b. %d %Y", 

errors='coerce')

frankr6591
  • 1,211
  • 1
  • 8
  • 14