I have a dataframe where there are birthdays that have regular dates mixed with Excel serial dates like this:
09/01/2020 12:00:00 AM
05/15/1985 12:00:00 AM
06/07/2013 12:00:00 AM
33233
26299
29428
I tried a solution from this answer, and all of the dates that are in the Excel serial format are blanked out, while preserving those that were in a normal date format.
This is my code:
import pandas as pd
import xlrd
import numpy as np
from numpy import *
from numpy.core import *
import os
import datetime
from datetime import datetime, timedelta
import glob
def from_excel_ordinal(ordinal, _epoch0=datetime(1899, 12, 31)):
if ordinal >= 60:
ordinal -= 1 # Excel leap year bug, 1900 is not a leap year!
return (_epoch0 + timedelta(days=ordinal)).replace(microsecond=0)
path = 'C:\\Input'
os.chdir(path)
filelist = glob.glob('*BLAH*.xlsx')
filename = os.fsdecode(filelist[0])
df = pd.read_excel(filename, sheet_name = 'Blah Blah')
m = df['Birthday'].astype(str).str.isdigit()
df.loc[m, 'Birthday'] = df.loc[m, 'Birthday'].astype(int).apply(from_excel_ordinal)
df['Birthday'] = pd.to_datetime(df['Birthday'], errors = 'coerce')
I am not sure where I am going wrong with this since the code shouldn't be blanking out the birthdays like it is doing.