I have two dataframes - one with predictors (df_learn
), one with targets ( target_learn
). I want to create a list of scikit-learn models (ml_list
), one per target. So far, I have written this.
import pandas as pd
import numpy as np
from sklearn.ensemble import GradientBoostingRegressor as GBM
df_learn = pd.DataFrame({'x1':[0,0,0,1,1,1], 'x2':[1,0,1,0,1,0], 'x3':[1,1,0,0,0,0]})
target_learn = pd.DataFrame({'y1':[1,0,0,2,2,0], 'y2':[1,1,1,0,1,0]})
target_colnames = ['y1', 'y2']
ml_list = [GBM(n_estimators = 5, max_depth= 2, min_samples_split= 2,
learning_rate=0.1, loss = 'ls')]*2
for i in [0,1] :
ml_list[i] = ml_list[i].fit(df_learn, target_learn[target_colnames[i]])
To check this, I created a list of predictions.
pred_list = []
for i in [0,1] :
pred_list.append(ml_list[i].predict(df_learn))
pd.DataFrame.from_items(zip(target_colnames, pred_list))
The result surprised me, as I got the exact same predictions for both targets.
y1 y2
0.80317 0.80317
0.80317 0.80317
0.80317 0.80317
0.39366 0.39366
0.80317 0.80317
0.39366 0.39366
When I ran each model separately (without using a list), I had two distinct predictions.
m1 = GBM(n_estimators = 5, max_depth= 2, min_samples_split= 2,
learning_rate=0.1, loss = 'ls')
m2 = GBM(n_estimators = 5, max_depth= 2, min_samples_split= 2,
learning_rate=0.1, loss = 'ls')
m1 = m1.fit(df_learn, target_learn['y1'])
m2 = m2.fit(df_learn, target_learn['y2'])
p1 = m1.predict(df_learn)
p2 = m2.predict(df_learn)
pd.DataFrame.from_items(zip(target_colnames, [p1,p2]))
Which gave the following results.
y1 y2
0.710278 0.80317
0.608147 0.80317
0.567309 0.80317
0.901585 0.39366
1.311095 0.80317
0.901585 0.39366
Obviously, at least one of the for
loop seems to overwrite the result of the previous member in the list. I assume this has to be related to some copy/deep-copy issue. How should I fix it ?