Python Subtracting two columns with date data, from csv to get number of weeks , months?

Question

I have a csv in which I have two columns representing start date: st_dt and end date: 'end_dt` , I have to subtract these columns to get the number of weeks. I tried iterating through columns using pandas, but it seems my output is wrong.

 st_dt                 end_dt
---------------------------------------
20100315           20100431

show us what you tried and what's the output you getting? Also provide expected output. — Sociopath, Feb 14 '18 at 05:45
This is what i am coding, for idx, date in data.iterrows(): data['expdate'] = datetime.strptime( str(date['exdate']), '%Y%m%d') - datetime.strptime( str(date['date']), '%Y%m%d') after this how can i update the my results in CSV file — Quant, Feb 14 '18 at 06:21
expected output should be in number of days, which i can convert into weeks or months . Please note i have thousand of dates in csv. — Quant, Feb 14 '18 at 06:26
@Quant - `20100431` does not exist, only `20100430` it is typo? — jezrael, Feb 14 '18 at 06:29

jezrael · Answer 1 · 2018-02-14T06:40:23.777

2

Use read_csv with parse_dates for datetimes and then after substract days:

df = pd.read_csv(file, parse_dates=[0,1])
print (df)
       st_dt     end_dt
0 2010-03-15 2010-04-30

df['diff'] = (df['end_dt'] - df['st_dt']).dt.days
print (df)
       st_dt     end_dt  diff
0 2010-03-15 2010-04-30    46

If some dates are wrong like 20100431 use to_datetime with parameter errors='coerce' for convert them to NaT:

df = pd.read_csv(file)
print (df)
      st_dt    end_dt
0  20100315  20100431
1  20100315  20100430

df['st_dt'] = pd.to_datetime(df['st_dt'], errors='coerce', format='%Y%m%d')
df['end_dt'] = pd.to_datetime(df['end_dt'], errors='coerce', format='%Y%m%d')

df['diff'] = (df['end_dt'] - df['st_dt']).dt.days
print (df)
       st_dt     end_dt  diff
0 2010-03-15        NaT   NaN
1 2010-03-15 2010-04-30  46.0

edited Feb 14 '18 at 06:40

answered Feb 14 '18 at 06:29

jezrael

822,522
95
1,334
1,252

Thanks for your effort, but this is the way i am doing , because the values from CSV are in raw format i think we cannot directly use dt.days values = [] for idx, date in data.iterrows(): result = datetime.strptime( str(date['exdate']), '%Y%m%d') - datetime.strptime( str(date['date']), '%Y%m%d') values.append(result) – Quant Feb 14 '18 at 07:11
Yes, the best not use it, because `iterrows` very slow. – jezrael Feb 14 '18 at 07:12
check [this](https://stackoverflow.com/questions/24870953/does-iterrows-have-performance-issues/24871316#24871316) – jezrael Feb 14 '18 at 07:13

Python Subtracting two columns with date data, from csv to get number of weeks , months?

1 Answers1