TL;DR:
Use..
from datetime import datetime
df['Parsed_Date2'] = [datetime.fromordinal(datetime(1900, 1, 1).toordinal() + int(x) - 2) \
if '/' not in x else pd.to_datetime(x) for x in df['Close Date']]
Bit more thinking... This is for if you have your column in Pandas already and want an option to deal with an existing column (of strings); if you're reading in an Excel file other options may work at the time of accessing the file.
Code:
import pandas as pd
from datetime import datetime
# create a test dataframe
df = pd.DataFrame({'Close Date': ['12/6/2011 0:00', '1/28/2011 0:00', '40571', '12/28/2010 0:00', '12/13/2010 0:00', '11/15/2011 0:00', '8/24/2011 0:00', '40550', '40550']})
# create a function to parse the 'Close Date' column
def excelDate(x):
#check if x is an Excel number (like 40571) and process accordingly
if '/' not in x:
x = datetime.fromordinal(datetime(1900, 1, 1).toordinal() + int(x) - 2)
#if it's not an Excel number just parse it as normal
else:
x = pd.to_datetime(x)
return x
# apply the function above to an new column 'Parsed_Date' for comparison
df['Parsed_Date'] = df.apply(lambda x: excelDate(x['Close Date']), axis=1)
print(df)
Outputs:
Close Date Parsed_Date
0 12/6/2011 0:00 2011-12-06
1 1/28/2011 0:00 2011-01-28
2 40571 2011-01-28
3 12/28/2010 0:00 2010-12-28
4 12/13/2010 0:00 2010-12-13
5 11/15/2011 0:00 2011-11-15
6 8/24/2011 0:00 2011-08-24
7 40550 2011-01-07
8 40550 2011-01-07
Alternatives
If you don't like apply
(very often slower)...
# Another Way To Parse: do the above code but as a list comprehension and without apply and a separate function
df['Parsed_Date2'] = [datetime.fromordinal(datetime(1900, 1, 1).toordinal() + int(x) - 2) \
if '/' not in x else pd.to_datetime(x) for x in df['Close Date']]
...or importing an extra module (datetime
)...
# Yet Another Way To Parser: use @MrFuppes suggestion so you don't need to import datetime
df['Parsed_Date3'] = [pd.to_datetime(int(x), unit='D', origin='1899-12-30') \
if '/' not in x else pd.to_datetime(x) for x in df['Close Date']]
If you ran all of them (why?, why not!¬) you would output:
Close Date Parsed_Date Parsed_Date2 Parsed_Date3
0 12/6/2011 0:00 2011-12-06 2011-12-06 2011-12-06
1 1/28/2011 0:00 2011-01-28 2011-01-28 2011-01-28
2 40571 2011-01-28 2011-01-28 2011-01-28
3 12/28/2010 0:00 2010-12-28 2010-12-28 2010-12-28
4 12/13/2010 0:00 2010-12-13 2010-12-13 2010-12-13
5 11/15/2011 0:00 2011-11-15 2011-11-15 2011-11-15
6 8/24/2011 0:00 2011-08-24 2011-08-24 2011-08-24
7 40550 2011-01-07 2011-01-07 2011-01-07
8 40550 2011-01-07 2011-01-07 2011-01-07
Timings:
Apply: 2.45 ms ± 147 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Using datetime in list comp: 1.68 ms ± 21.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Using just pandas in list comp: 2.28 ms ± 77.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)