1

I am using pd.to_datetime to convert strings into datetime;

df = pd.DataFrame(data={'id':['DD-83']})
pd.to_datetime(df['id'].str.replace(r'\D+', ''), errors='coerce', format='%d%m')

%d%m defines zero-padded day and month, but the code still converts the above string into

0   1900-03-08
Name: id, dtype: datetime64[ns]

I am wondering how to avoid it being converted into datetime (e.g. convert to NaT in this case), if the month and day in a string are not 0-padded. So

 DD0306 
 DD0706
 DD-83

will convert to

 1900-06-03
 1900-06-07
 NaT 
daiyue
  • 7,196
  • 25
  • 82
  • 149
  • Check out https://stackoverflow.com/questions/45595943/how-to-require-a-timestamp-to-be-zero-padded-during-validation-in-python – Wytamma Wirth Jun 24 '19 at 15:23
  • I suggest to have `df['id'].str.replace(r'\D+', '-')` and then `format='%d-%m'`, so that there is a clear distinction. It returns `NaT` for `-83` – harvpan Jun 24 '19 at 15:26

1 Answers1

1

You need to look for - and only pass strings without -.

Setup:

df = pd.DataFrame(data={'id':['DD-83', 'DD0706', 'DD0306']})

Code:

df['date'] = pd.to_datetime(df['id'].loc[~df['id'].str.contains('-')].str.replace(r'\D+', ''), errors='coerce', format='%d%m')

Output:

       id   date
0   DD-83   NaT
1   DD0706  1900-06-07
2   DD0306  1900-06-03
harvpan
  • 8,571
  • 2
  • 18
  • 36
  • but other valid strings like `DD0306`, `DD0706` also converted into `NaT` – daiyue Jun 24 '19 at 15:42
  • I tried the code with `df = pd.DataFrame(data={'id':['DD-83']})`, `to_datetime(to_datetime( df['id'].loc[ ~df['id'].str.contains('-')].str.replace(r'\D+', ''), errors='coerce', format='%d%m'), errors='coerce', format='%d-%m', infer_datetime_format=True)` but returned an empty Series rather than `NaT` – daiyue Jun 24 '19 at 16:49
  • @daiyue, see the full edited answer. I have attached the setup as well. – harvpan Jun 24 '19 at 17:05