Convert to_datetime when days don't contain leading zero

Question

I'm trying to get the index of my dataFrame to be of type datetime. My CSV file contains seperate columns of Dates and Times which i combine upon importing:

df = pd.read_csv("example.csv", sep=";", decimal=",", parse_dates=[["Date", "Time"]])

It will look like this after the import:

	Date_Time
0	1012020 00:00:00
1	1012020 00:15:00

The problem is the missing leading zero on the first 9 days of each month. Pandas to_datetime() needs a leading zero for the %d format option to work. When i use format="%d%m%Y%H:%M:%S" python says "invalid syntax"

How can I convert this column to datetime?

ouroboros1 · Accepted Answer · 2022-09-06T06:49:14.690

1

Use Series.str.zfill (as suggested by @FObersteiner in the comments) and apply pd.to_datetime afterwards:

import pandas as pd

# changing 2nd val to `'12012020 00:15:00'` to show that 
# only the 1st val is affected
data = {'Date_Time': {0: '1012020 00:00:00', 1: '12012020 00:15:00'}}

df = pd.DataFrame(data)

df['Date_Time'] = pd.to_datetime(df["Date_Time"].str.zfill(17), 
                                 format="%d%m%Y %H:%M:%S")

print(df)

            Date_Time
0 2020-01-01 00:00:00
1 2020-01-12 00:15:00

print(df['Date_Time'].dtype)
datetime64[ns]

Another (admittedly, unnecessarily complicated) way to go, would be to use a regex pattern to replace all "dates" with 7 digits by their 8-digit equivalent:

df['Date_Time'] = pd.to_datetime(
    df['Date_Time'].replace(r'^(\d{7}\s)',r'0\1', regex=True),
    format="%d%m%Y %H:%M:%S")

Explanation r'^(\d{7}\s)':

^ assert position at start of the string
\d{7}\s matches 7 digits followed by a whitespace
The encapsulating brackets turn this into a Capturing Group

Explanation r'0\1':

\1 refers back to the Capturing Group (1st of 1 group(s)), to which we prepend 0

edited Sep 06 '22 at 06:49

answered Sep 04 '22 at 10:13

ouroboros1

9,113
3
7
26

Reads like you could achieve the same with str.zfill(8), no? But what if `1012020` means 10th of January? – FObersteiner Sep 04 '22 at 13:52
@FObersteiner: yes, hadn't thought of that. `df['Date_Time'].str.zfill(17)` should work (17 for the whole string). `1012020` will not mean 10th of January. As specified by the OP: only the day is missing the leading zero. So, 10th of January would be `10012020`. – ouroboros1 Sep 04 '22 at 14:14
I tried both but pandas seems to be the problem here. @ouroboros1 Trick worked but pandas removes the leading 0 again. The column is of type pandas.core.series.Series. I tried the solutions from this post https://stackoverflow.com/questions/13250046/how-to-keep-leading-zeros-in-a-column-when-reading-csv-with-pandas but it had no effect... – KaaraT Sep 05 '22 at 08:34
What exactly do you mean by "Trick worked but pandas removes the leading 0 again"? At what point is the "trick" working/at what stage does pandas seem to remove the 0 again? If you simply do `print(df['Date_Time'].str.zfill(17))`, does it show the strings as `01012020 00:00:00`, `01012020 00:15:00`, etc. with a leading 0, when required? And are you getting an actual error, or is the result just not what you expected. If the former, please share the error; if the latter, what exactly is the result? – ouroboros1 Sep 05 '22 at 08:59
@ouroboros1 Okay so i do `df["Date_Time"].str.zfill(17)` which will print out `0 01012020 00:00:00`. However if I do `df["Date_Time"]` in the next cell the output will be `0 1012020 00:00:00` again, without the leading zero. So it doesn't seem to get overwritten? – KaaraT Sep 06 '22 at 06:36
It's not an `inplace` operation. You need to overwrite the value of `df["Date_Time"]` for pandas to keep the change. So, use: `df["Date_Time"] = df["Date_Time"].str.zfill(17)`. But the full line would simply be: `df['Date_Time'] = pd.to_datetime(df["Date_Time"].str.zfill(17), format="%d%m%Y %H:%M:%S")` – ouroboros1 Sep 06 '22 at 06:39

Convert to_datetime when days don't contain leading zero

1 Answers1