Use read_csv
with parse_dates
for datetimes and then after substract days
:
df = pd.read_csv(file, parse_dates=[0,1])
print (df)
st_dt end_dt
0 2010-03-15 2010-04-30
df['diff'] = (df['end_dt'] - df['st_dt']).dt.days
print (df)
st_dt end_dt diff
0 2010-03-15 2010-04-30 46
If some dates are wrong like 20100431
use to_datetime
with parameter errors='coerce'
for convert them to NaT
:
df = pd.read_csv(file)
print (df)
st_dt end_dt
0 20100315 20100431
1 20100315 20100430
df['st_dt'] = pd.to_datetime(df['st_dt'], errors='coerce', format='%Y%m%d')
df['end_dt'] = pd.to_datetime(df['end_dt'], errors='coerce', format='%Y%m%d')
df['diff'] = (df['end_dt'] - df['st_dt']).dt.days
print (df)
st_dt end_dt diff
0 2010-03-15 NaT NaN
1 2010-03-15 2010-04-30 46.0