Passing ensemble weights/parameters to calculate model accuracy

Question

I have a feeling this question has a glaringly obvious and simple solution that I have perhaps overlooked.

Assume I have a model f that is reliant on some inputs x and a parameter set p to produce a binary classification output y. I am using a linear model to serve as f for now but I'd like to keep the architecture flexible so I can easily substitute a neural network or non-linear model in the future.

My question is, instead of training models to generate an optimal parameter set p which produces the highest accuracy, is there a way to calculate accuracy of a model from manually inserted parameters? Reason I ask is because I want analyse an ensemble method, but instead of fitting multiple models and averaging/weighing their predictions, I want to bypass training and calculate accuracies of randomly sampled parameter sets p from a specified distribution.

In other words, I want to specify something like this:

model = linear
parameter_set = p #manually inserted array/distribution of possible parameters
sample_params = np.random.choice(parameter_set, size=100, replace=True) #a subset of the true parameter set

I now need to calculate all 100 accuracies of the sample parameters as if they were 100 independent models and reject them based on some criteria. Importantly, I want to evaluate the models only on the training set. I have looked at many libraries and found something close using validation_curve and gridsearch within sklearn.model_selection however validation_curve evaluates for the test set as well and gridsearch searches for the optimal parameter. I also looked at cross validation techniques, lmfit, scipy.optimize.curve_fit, and more. I know I can manually write in the function as well as a function to calculate the accuracy and loop through it with the parameter sets and reject incrementally based on the threshold I specify, but that obviously defeats the purpose of what I'm trying to do.

I'm just checking if there is a nifty way to say: here is an architecture for a model, here are parameter sets, this is the data, tell me what each accuracy is and reject the parameter sets/models which have an accuracy less than 0.5 for example. Any thoughts or suggestions would be greatly appreciated.

If it is possible then you can use your random parameter sets as starting values and Max iterations to 0 and specify the same set for training and testing. Would be worth a hack and you'll still have to write the loop — sramalingam24, Oct 10 '18 at 14:43
Writing a for loop is probably inevitable, but this [post](https://stackoverflow.com/questions/34624978/is-there-easy-way-to-grid-search-without-cross-validation-in-python) might help you get started with building the parameter grid to loop across and train/evaluate. Otherwise, if you don't mind splitting your training set for a CV approach, you can use `GridSearchCV` and access the `cv_results_` attribute which will have a list of `mean_train_score`s for the various values of the `params` you used. — Scratch'N'Purr, Oct 10 '18 at 14:48
@sramalingam24 how does one specify train and test sets to be equal within the validation curve function? If that's possible, I believe it will be the solution I'm looking for! — Amy, Oct 11 '18 at 08:02
@Scratch'N'Purr, thanks very much. This is a helpful starting point for the loop. — Amy, Oct 11 '18 at 08:03
I see that I can use validation_curve with cv set to cv=[(slice(None), slice(None))], which is obviously not a recommended approach. But nice to know one can "turn off" the cv component of val curve — Amy, Oct 11 '18 at 11:34

Passing ensemble weights/parameters to calculate model accuracy

0 Answers0