I am trying to implement a top decile recall/precision scoring function to insert into gridsearchCV. However, I am unable to figure out what is wrong. What I would like to do is to have my scoring function take in the probability prediction, actual label and ideally the decile threshold in percentage. I would then rank order the scores and then identify the conversion rate within the decile threshold. E.g. the conversion rate of the top 10% of the population. That conversion rate would be the score that I output. THe higher the better. However, when I run the code below, I dont get the probability scores and I dont understand what the input to the scoring function is. The print statements below return only 1's and 0's instead of probabilities.
def top_decile_conversion_rate(y_prob, y_actual):
# Function goes in here
print y_prob, y_actual
return 0.5
features = pd.DataFrame({"f1":np.random.randint(1,1000,500) , "f2":np.random.randint(1,1000,500),
"label":[round(x) for x in np.random.random_sample(500)]})
my_scorer = make_scorer(top_decile_conversion_rate, greater_is_better=True)
gs = grid_search.GridSearchCV(
estimator=LogisticRegression(),
param_grid={'C': [i for i in range(1, 3)], 'class_weight': [None], 'penalty':['l2']},
cv=2,
scoring=my_scorer )
model = gs.fit(features[["f1","f2"]], features.label)