EDIT: I realised that my previous attempt at a way to describe the problem wasn't very helpful and indeed didnt actually model what I'm currently doing at the moment very well, so I've rewritten the post. I've included my working code. The data in the example I will use is the lending club loan data (csv format), which can be downloaded from here: https://www.kaggle.com/wendykan/lending-club-loan-data
I'm currently utilising PuLP and pandas in python to solve an optimization issue I have at the moment. I am quite new to linear programming, and utilised it recently to solve issues around minimising cost problems.
For this lending club example, lets say I have $100,000. Given that lending club loans are securitised and can be invested in by individuals, I would like to distribute the $100,000 such that:
1. the weighted average interest rate of the invested $100,000 cannot be below 8%.
2. The weighted average debt-to-income ratio of the invested $100,000 cannot exceed 10
Ignore for now (3. To minimise exposure to any one loan, the average loan size of the optimised portfolio cannot exceed $10,000.)
I am having difficulties implementing constraint 3 and the constraint of my investment ceiling of $100,000. For constraints 1 & 2, I've weighted loans by loan size and multiplied the interest rate and Debt-income ratio by the weights, and can thus model those constraints in a linear fashion such that:
Sum of weighted interest rates => 0.08
Sum of weighted debt-income ratios <= 10
The final portfolio of loans purchased do not need to exactly equal $100,000. The objective function is to get to as close to 100,000 within the constraints i.e. LpMaximise.
I've chosen to model the selection of loans as a binary variable, as I would simply like to know if they are in or out. Furthermore, for the sake of speed and memory, I've chosen a slice of 50 rows from the data to work with.
Here is my code:
import pandas as pd
from pulp import *
import numpy as np
df = pd.read_csv('~/Desktop/loan.csv')
df.columns = [c.replace(' ','') for c in df.columns]
#cleaning up the data to get rid of spaces and to standardise percentage data
df.loc[:,'id'] = df.loc[:,'id'].astype(int)
df['int_rate'] = df['int_rate']/100 #convert interest rate to ratios.
#slicing the data to get a small sample of 50 loans
df = df.iloc[0:49,:]
#setting up the weighted averages for linear equations
sumloans = df.loc[:,'funded_amnt'].sum()
df['weights'] = df['funded_amnt'].div(sumloans,axis='index')
#Converting dataframe to weighted values!
df2 = df[["id","funded_amnt","dti","int_rate"]]
df2[["funded_amntwtd","dtiwtd","int_ratewtd"]] = df[["funded_amnt","dti","int_rate"]].multiply(df["weights"],axis="index")
df3 = pd.merge(df,df2.iloc[:,[4,5,6]],on=df["id"],how='left')
#Free up memory
df = None
df2 = None
#Variable construction
loanid = df3['id'].tolist()
dtiwtd = df3.set_index('id').to_dict()['dtiwtd']
loanmix = df3.set_index('id').to_dict()['funded_amnt']
wtdloanmix = df3.set_index('id').to_dict()['funded_amntwtd']
wa_int = df3.set_index('id').to_dict()['int_ratewtd']
id_vars = LpVariable.dicts("ID",indexs=loanid, cat='Integer',lowBound=0, upBound=1)
#Objective function added first. Summing all the loan values but examining constraints
prob = LpProblem("Funding",pulp.LpMaximize)
prob += lpSum([loanmix[i]*id_vars[i] for i in id_vars])
prob += lpSum([loanmix[i]*id_vars[i] for i in id_vars]) <= 100000 #"Sum of loans purchased must be equal to or less than $100,000"
prob += lpSum([dtiwtd[i]*id_vars[i] for i in id_vars]) <= 10 #"Sum of weighted dtis cannot be greater than 10"
prob += lpSum([wa_int[i]*id_vars[i] for i in id_vars]) >= 0.08 #"Sum of weighted interest rates cannot be less than 8%"
#Placeholder for inserting constraint on avg. loan size
prob.solve()
print("Status:", pulp.LpStatus[prob.status])
for v in prob.variables():
print(v.name, "=", v.varValue)
print("Total amount invested = ", value(prob.objective))
The solution status comes out as 'Unfeasible' and the output has some non binary integers.
I would appreciate any help around this matter. I'm a newbie to linear algebra and higher mathematics, but i have read this page (http://lpsolve.sourceforge.net/5.1/ratio.htm) which helped me set up the first two constraints. I'm just stuck on how to write an equation or code that would ensure the optimized portfolio has a average loan value of less than $10,000.