0

EDIT: I realised that my previous attempt at a way to describe the problem wasn't very helpful and indeed didnt actually model what I'm currently doing at the moment very well, so I've rewritten the post. I've included my working code. The data in the example I will use is the lending club loan data (csv format), which can be downloaded from here: https://www.kaggle.com/wendykan/lending-club-loan-data

I'm currently utilising PuLP and pandas in python to solve an optimization issue I have at the moment. I am quite new to linear programming, and utilised it recently to solve issues around minimising cost problems.

For this lending club example, lets say I have $100,000. Given that lending club loans are securitised and can be invested in by individuals, I would like to distribute the $100,000 such that:

1. the weighted average interest rate of the invested $100,000 cannot be below 8%.

2. The weighted average debt-to-income ratio of the invested $100,000 cannot exceed 10

Ignore for now (3. To minimise exposure to any one loan, the average loan size of the optimised portfolio cannot exceed $10,000.)

I am having difficulties implementing constraint 3 and the constraint of my investment ceiling of $100,000. For constraints 1 & 2, I've weighted loans by loan size and multiplied the interest rate and Debt-income ratio by the weights, and can thus model those constraints in a linear fashion such that:

Sum of weighted interest rates => 0.08
Sum of weighted debt-income ratios <= 10 

The final portfolio of loans purchased do not need to exactly equal $100,000. The objective function is to get to as close to 100,000 within the constraints i.e. LpMaximise.

I've chosen to model the selection of loans as a binary variable, as I would simply like to know if they are in or out. Furthermore, for the sake of speed and memory, I've chosen a slice of 50 rows from the data to work with.

Here is my code:

import pandas as pd
from pulp import *
import numpy as np



df = pd.read_csv('~/Desktop/loan.csv')
df.columns = [c.replace(' ','') for c in df.columns]

#cleaning up the data to get rid of spaces and to standardise percentage data

df.loc[:,'id'] = df.loc[:,'id'].astype(int)
df['int_rate'] = df['int_rate']/100     #convert interest rate to ratios.

#slicing the data to get a small sample of 50 loans
df = df.iloc[0:49,:]

#setting up the weighted averages for linear equations
sumloans = df.loc[:,'funded_amnt'].sum()
df['weights'] = df['funded_amnt'].div(sumloans,axis='index')

#Converting dataframe to weighted values!
df2 = df[["id","funded_amnt","dti","int_rate"]]
df2[["funded_amntwtd","dtiwtd","int_ratewtd"]] = df[["funded_amnt","dti","int_rate"]].multiply(df["weights"],axis="index")
df3 = pd.merge(df,df2.iloc[:,[4,5,6]],on=df["id"],how='left')

#Free up memory
df = None
df2 = None

#Variable construction
loanid = df3['id'].tolist()
dtiwtd = df3.set_index('id').to_dict()['dtiwtd']
loanmix = df3.set_index('id').to_dict()['funded_amnt']
wtdloanmix = df3.set_index('id').to_dict()['funded_amntwtd']
wa_int = df3.set_index('id').to_dict()['int_ratewtd']


id_vars = LpVariable.dicts("ID",indexs=loanid, cat='Integer',lowBound=0, upBound=1)


#Objective function added first. Summing all the loan values but examining constraints

prob = LpProblem("Funding",pulp.LpMaximize)
prob += lpSum([loanmix[i]*id_vars[i] for i in id_vars]) 
prob += lpSum([loanmix[i]*id_vars[i] for i in id_vars]) <= 100000 #"Sum of loans purchased must be equal to or less than $100,000"
prob += lpSum([dtiwtd[i]*id_vars[i] for i in id_vars]) <= 10 #"Sum of weighted dtis cannot be greater than 10"
prob += lpSum([wa_int[i]*id_vars[i] for i in id_vars]) >= 0.08 #"Sum of weighted interest rates cannot be less than 8%"  
#Placeholder for inserting constraint on avg. loan size
prob.solve()

print("Status:", pulp.LpStatus[prob.status])

for v in prob.variables():
    print(v.name, "=", v.varValue)

print("Total amount invested = ", value(prob.objective))    

The solution status comes out as 'Unfeasible' and the output has some non binary integers.

I would appreciate any help around this matter. I'm a newbie to linear algebra and higher mathematics, but i have read this page (http://lpsolve.sourceforge.net/5.1/ratio.htm) which helped me set up the first two constraints. I'm just stuck on how to write an equation or code that would ensure the optimized portfolio has a average loan value of less than $10,000.

Totes187
  • 11
  • 2
  • Please post your best attempt to solve this in PuLP, then the community can help steer you. – thomaskeefe May 30 '17 at 22:06
  • Hi thomas, I've rewritten the example, and included all my working code. I'd appreciate any help you can offer. – Totes187 Jun 01 '17 at 11:41
  • Constraint 3 is bogus (doesn't fulfil its stated purpose). You could have one large loan for $99900 and 99 tiny loans for $1 each, and the average loan size will be less than $1000 but you'll still be "extremely exposed" to one loan. – Brendan Jun 01 '17 at 12:49
  • In the case of lending club and the real world example I am working on, there are minimum requirements, which would preclude this type of scenario from happening. I'll edit the post to ignore that constraint, but thanks for the input. – Totes187 Jun 01 '17 at 13:26
  • just to clarify previous post, there are minimum loan requirements (e.g. cannot take out less than 500 or 1000 say). On the constraint of a spending ceiling of $100,000, how would i implement that within the code I have? – Totes187 Jun 01 '17 at 13:38

1 Answers1

0

You are doing linear programming on integers, which is more troublesome than with reals (doubles). Since your amounts can be treated as doubles, drop the ints.

Is id_vars supposed to hold amounts or proportions (adding to 1)? If the latter then you need an extra constraint on their sum. (And with your binary set up it would mean a single one and n-1 zeroes.) And if they are amounts then your weighted averages are not averages, e.g. as it stands you ensure that the sum (not average) of interest rates is at least 8%, which you can achieve with 8 loans at 1%.

Also: If I understand the syntax correctly (I am familiar with LP but not with your library), you are trying to maximize the investment amount, under constraint that it be at least 100000. This is unexpected: why not maximize return?

Constraint 2, "average debt-to-income ratio cannot exceed 10", is mathematically incorrect. If you have 2 investments with debts d1 and d2, income i1 and i2, with proportions a and 1-a, the debt-to-income ratio of the lot is [a*d1 + (1-a)d2] / [ai1 + (1-a)i2], not ad1/i1 + (1-a)*d2/i2.

Constraint 3 would be better implemented as a series of constraints (as hinted at in the comments): loanmix[i] <= 0.1 for all i.

Minimum loan requirements are not linear (big taboo in linear programming). Indeed, they are of the form: for all i, loanmix[i] == 0 or loanmix[i] >= minloan[i]. A hack may be to discretize: instead of min of $1000, use increments of $1000. But then you are back to Integerland.