1

I would like to return specific strings from two passed date columns. My code so far:

df_Date = df[df['state'].str.contains('Traded Away') & df['maturity_date']!=0][['state','maturity_date']]
df_Date['maturity_date'] = pd.to_datetime(df_Date['maturity_date'])
df_Date['Today'] = pd.to_datetime('today') 
df_Date['Days'] = (df_Date['maturity_date'] - df_Date['Today'])
print(df_Date.head(10))


          state maturity_date      Today       Days
0   Traded Away    2018-03-15 2018-03-19    -4 days
10  Traded Away    2025-06-15 2018-03-19  2645 days
12  Traded Away    2047-03-21 2018-03-19 10594 days
15  Traded Away    2166-03-15 2018-03-19 54052 days
17  Traded Away    2166-12-18 2018-03-19 54330 days
20  Traded Away    2023-05-04 2018-03-19  1872 days
22  Traded Away    2027-11-15 2018-03-19  3528 days
23  Traded Away    2025-03-15 2018-03-19  2553 days
25  Traded Away    2023-01-15 2018-03-19  1763 days
26  Traded Away    2166-05-01 2018-03-19 54099 days

My function to convert the days to strings is as follows and yields the error: TypeError: invalid type comparison when I print the dataframe.

def Risk_Bucket(x): 
    if x  <= 730:
        return '< 2YR'
    elif  (x > 730 and x  <= 1825):
        return '2YR_5YR'
    elif  (x > 1825 and x  <= 2555):
        return '5YR_7YR'
    elif  (x > 2555 and x <= 3650):
        return '7YR_10YR'
    elif  (x > 3650 and x  <= 7300):
        return '10YR_20YR'
    elif  (x > 7300):
        return '> 20YR'
    else:
        return "Check passed Date"
df_Date['Bucket'] = Risk_Bucket(df_Date['Days'])

print(df_Date.head(10))

I assume this is because of the Days columns has string 'days' in it?

How do I make the Days column numeric? Any suggestions to resolve this and improve my code?

Peter Lucas
  • 1,979
  • 1
  • 16
  • 27
  • Related: [Python: Convert timedelta to int in a dataframe](https://stackoverflow.com/questions/25646200/python-convert-timedelta-to-int-in-a-dataframe) – jpp Mar 19 '18 at 10:22

1 Answers1

2

I believe you need convert timedeltas ot days by days:

df_Date['Bucket'] = df_Date['Days'].dt.days.apply(Risk_Bucket)

Improved code with cut:

bins = [-np.inf,730,1825,2555,3650,7300,np.inf]
labels = ['< 2YR', '2YR_5YR','5YR_7YR','7YR_10YR', '10YR_20YR', '> 20YR']
df_Date['Bucket'] = pd.cut(df_Date['Days'].dt.days, bins=bins, labels=labels)

Verify:

bins = [-np.inf,730,1825,2555,3650,7300,np.inf]
labels = ['< 2YR', '2YR_5YR','5YR_7YR','7YR_10YR', '10YR_20YR', '> 20YR']
df_Date['Bucket'] = df_Date['Days'].dt.days.apply(Risk_Bucket)
df_Date['Bucket1'] = pd.cut(df_Date['Days'].dt.days, bins=bins, labels=labels)
print (df_Date)
          state maturity_date       Today       Days    Bucket   Bucket1
0   Traded Away    2018-03-15  2018-03-19    -4 days     < 2YR     < 2YR
10  Traded Away    2025-06-15  2018-03-19    72 days     < 2YR     < 2YR
12  Traded Away    2047-03-21  2018-03-19 10594 days    > 20YR    > 20YR
15  Traded Away    2166-03-15  2018-03-19 54052 days    > 20YR    > 20YR
17  Traded Away    2166-12-18  2018-03-19 54330 days    > 20YR    > 20YR
20  Traded Away    2023-05-04  2018-03-19  1872 days   5YR_7YR   5YR_7YR
22  Traded Away    2027-11-15  2018-03-19  3528 days  7YR_10YR  7YR_10YR
23  Traded Away    2025-03-15  2018-03-19  2553 days   5YR_7YR   5YR_7YR
25  Traded Away    2023-01-15  2018-03-19  1763 days   2YR_5YR   2YR_5YR
26  Traded Away    2166-05-01  2018-03-19 54099 days    > 20YR    > 20YR
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Thanks Jezrael...again ; ) dt.days convertrs the values to integer? – Peter Lucas Mar 19 '18 at 10:24
  • @PeterLucas - Exactly, but here `cut` is better. – jezrael Mar 19 '18 at 10:28
  • Hi @jezrael I'm getting that SettingWithCopyWarning error (A value is trying to be set on a copy of a slice from a DataFrame). After reading the numerous posts I thought df_Date = df_Date.copy() would remove the warning? – Peter Lucas Mar 20 '18 at 03:57
  • @PeterLucas - Can you show code where you get warning? Also this warning is a bit confused, so need also check code above - 1,2 lines. Obviosly there is some filtering, then need `copy`, you are right. e.g. `df1 = df[df['state'].isin(['Traded Away','another value'])].copy()` – jezrael Mar 20 '18 at 06:25
  • Hmm, second line should be `dfRFQ_Breakdown_By_State.loc[mask, 'rbc_security_type1'] = dfRFQ_Breakdown_By_State.loc[mask, 'instrument_group'].str.upper() ` - filtering in both sides. – jezrael Mar 20 '18 at 09:00
  • I traced it back to my initial dataframe: mask = dfRFQ_Breakdown_By_State['rbc_security_type1'] == 0 dfRFQ_Breakdown_By_State.loc[mask, 'rbc_security_type1'] = dfRFQ_Breakdown_By_State['instrument_group'].str.upper() Here I am updating a column based on the criteria with the value in the other column. The bucketting above is then used on this updated dataframe. – Peter Lucas Mar 20 '18 at 09:01
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/167151/discussion-between-peter-lucas-and-jezrael). – Peter Lucas Mar 20 '18 at 09:05