I need to find optimal discount for each product (in e.g. A, B, C) so that I can maximize total sales. I have existing Random Forest models for each product that map discount and season to sales. How do I combine these models and feed them to an optimiser to find the optimum discount per product?
Reason for model selection:
- RF: it's able to give better(w.r.t linear models) relation between predictors and response(sales_uplift_norm).
- PSO: suggested in many white papers(available at researchgate/IEEE), also availability of the package in python here and here.
Input data: sample data used to build model at product level. Glance of the data as below:
Idea/Steps followed by me:
- Build RF model per products
# pre-processed data
products_pre_processed_data = {key:pre_process_data(df, key) for key, df in df_basepack_dict.items()}
# rf models
products_rf_model = {key:rf_fit(df) for key, df in products_pre_processed_data .items()}
- Pass the model to optimizer
- Objective function: maximize sales_uplift_norm (the response variable of RF model)
- Constraint:
- total spend(spends of A + B + C <= 20), spends = total_units_sold_of_products * discount_percentage * mrp_of_products
- lower bound of products(A, B, C): [0.0, 0.0, 0.0] # discount percentage lower bounds
- upper bound of products(A, B, C): [0.3, 0.4, 0.4] # discount percentage upper bounds
sudo/sample code # as I am unable to find a way to pass the product_models into optimizer.
from pyswarm import pso
def obj(x):
model1 = products_rf_model.get('A')
model2 = products_rf_model.get('B')
model3 = products_rf_model.get('C')
return -(model1 + model2 + model3) # -ve sign as to maximize
def con(x):
x1 = x[0]
x2 = x[1]
x3 = x[2]
return np.sum(units_A*x*mrp_A + units_B*x*mrp_B + units_C* x *spend_C)-20 # spend budget
lb = [0.0, 0.0, 0.0]
ub = [0.3, 0.4, 0.4]
xopt, fopt = pso(obj, lb, ub, f_ieqcons=con)
Dear SO experts, Request your guidance(struggling to find any guidance since couple of weeks) on how to use the PSO optimizer(or any other optimizer if I am not following right one) with RF.
Adding functions used for model:
def pre_process_data(df,product):
data = df.copy().reset_index()
# print(data)
bp = product
print("----------product: {}----------".format(bp))
# Pre-processing steps
print("pre process df.shape {}".format(df.shape))
#1. Reponse var transformation
response = data.sales_uplift_norm # already transformed
#2. predictor numeric var transformation
numeric_vars = ['discount_percentage'] # may include mrp, depth
df_numeric = data[numeric_vars]
df_norm = df_numeric.apply(lambda x: scale(x), axis = 0) # center and scale
#3. char fields dummification
#select category fields
cat_cols = data.select_dtypes('category').columns
#select string fields
str_to_cat_cols = data.drop(['product'], axis = 1).select_dtypes('object').astype('category').columns
# combine all categorical fields
all_cat_cols = [*cat_cols,*str_to_cat_cols]
# print(all_cat_cols)
#convert cat to dummies
df_dummies = pd.get_dummies(data[all_cat_cols])
#4. combine num and char df together
df_combined = pd.concat([df_dummies.reset_index(drop=True), df_norm.reset_index(drop=True)], axis=1)
df_combined['sales_uplift_norm'] = response
df_processed = df_combined.copy()
print("post process df.shape {}".format(df_processed.shape))
# print("model fields: {}".format(df_processed.columns))
return(df_processed)
def rf_fit(df, random_state = 12):
train_features = df.drop('sales_uplift_norm', axis = 1)
train_labels = df['sales_uplift_norm']
# Random Forest Regressor
rf = RandomForestRegressor(n_estimators = 500,
random_state = random_state,
bootstrap = True,
oob_score=True)
# RF model
rf_fit = rf.fit(train_features, train_labels)
return(rf_fit)
EDIT: updated dataset to simplified version.