0

I have a feeling this question has a glaringly obvious and simple solution that I have perhaps overlooked.

Assume I have a model f that is reliant on some inputs x and a parameter set p to produce a binary classification output y. I am using a linear model to serve as f for now but I'd like to keep the architecture flexible so I can easily substitute a neural network or non-linear model in the future.

My question is, instead of training models to generate an optimal parameter set p which produces the highest accuracy, is there a way to calculate accuracy of a model from manually inserted parameters? Reason I ask is because I want analyse an ensemble method, but instead of fitting multiple models and averaging/weighing their predictions, I want to bypass training and calculate accuracies of randomly sampled parameter sets p from a specified distribution.

In other words, I want to specify something like this:

model = linear
parameter_set = p #manually inserted array/distribution of possible parameters
sample_params = np.random.choice(parameter_set, size=100, replace=True) #a subset of the true parameter set

I now need to calculate all 100 accuracies of the sample parameters as if they were 100 independent models and reject them based on some criteria. Importantly, I want to evaluate the models only on the training set. I have looked at many libraries and found something close using validation_curve and gridsearch within sklearn.model_selection however validation_curve evaluates for the test set as well and gridsearch searches for the optimal parameter. I also looked at cross validation techniques, lmfit, scipy.optimize.curve_fit, and more. I know I can manually write in the function as well as a function to calculate the accuracy and loop through it with the parameter sets and reject incrementally based on the threshold I specify, but that obviously defeats the purpose of what I'm trying to do.

I'm just checking if there is a nifty way to say: here is an architecture for a model, here are parameter sets, this is the data, tell me what each accuracy is and reject the parameter sets/models which have an accuracy less than 0.5 for example. Any thoughts or suggestions would be greatly appreciated.

Amy
  • 13
  • 4
  • If it is possible then you can use your random parameter sets as starting values and Max iterations to 0 and specify the same set for training and testing. Would be worth a hack and you'll still have to write the loop – sramalingam24 Oct 10 '18 at 14:43
  • 1
    Writing a for loop is probably inevitable, but this [post](https://stackoverflow.com/questions/34624978/is-there-easy-way-to-grid-search-without-cross-validation-in-python) might help you get started with building the parameter grid to loop across and train/evaluate. Otherwise, if you don't mind splitting your training set for a CV approach, you can use `GridSearchCV` and access the `cv_results_` attribute which will have a list of `mean_train_score`s for the various values of the `params` you used. – Scratch'N'Purr Oct 10 '18 at 14:48
  • @sramalingam24 how does one specify train and test sets to be equal within the validation curve function? If that's possible, I believe it will be the solution I'm looking for! – Amy Oct 11 '18 at 08:02
  • @Scratch'N'Purr, thanks very much. This is a helpful starting point for the loop. – Amy Oct 11 '18 at 08:03
  • I see that I can use validation_curve with cv set to cv=[(slice(None), slice(None))], which is obviously not a recommended approach. But nice to know one can "turn off" the cv component of val curve – Amy Oct 11 '18 at 11:34

0 Answers0