I have a pandas dataframe df
with dates as strings:
Date1 Date2
2017-08-31 1970-01-01 17:35:00
2017-10-31 1970-01-01 15:00:00
2017-11-30 1970-01-01 16:30:00
2017-10-31 1970-01-01 16:00:00
2017-10-31 1970-01-01 16:12:00
What I want to do is replace each date part in the Date2
column with the corresponding date in Date1
but leave the time untouched, so the output is:
Date1 Date2
2017-08-31 2017-08-31 17:35:00
2017-10-31 2017-10-31 15:00:00
2017-11-30 2017-11-30 16:30:00
2017-10-31 2017-10-31 16:00:00
2017-10-31 2017-10-31 16:12:00
I have achieved this using pandas replace
and regex's as such
import re
date_reg = re.compile(r"([0-9]{4}\-[0-9]{2}\-[0-9]{2})")
df['Market Close Time'].replace(to_replace=date_reg, value=df['Date1'], inplace=True)
But this method is very slow (>10 minutes) for a dataframe with only 150k rows.
The solution from this post implements numpy np.where
which is much faster - how can I use the np.where
in this example, or is there another more efficient way to perform this operation?