Is it okay to have non-trainable params in machine learning?

Question

When building a model for machine learning, is it okay to have non-trainable params? Or does this create errors in the model? I'm confused as to what non-trainable params actually are and how to fix your model based on that.

I don't see why it wouldn't be ok. You have a bias unit already in neural networks and don't train that. I would think you could have other ones as well and just leave it out of the training. — Chrispresso, Aug 20 '18 at 16:40
@Chrispresso bias units are trained...what do you mean you don't train that? — enumaris, Aug 20 '18 at 17:12
Don't say "machine learning" if you specifically mean "NN" or "multilayer NN". If you meant NN, did you mean the individual layer dimensions, connectivity (full/partial), activation function? Or hyperparameters in gradient-boosted trees, like learning rate, regularization? Or fixed-effects or mixed-effects modeling in regression or random-forest? Or something else? Or all of these? (I tagged this hyperparameters as a guess, but please edit that if not applicable) — smci, Sep 07 '18 at 21:17

Teoretic · Accepted Answer · 2018-08-20T19:06:05.447

EDIT: as enumaris has mentioned in comments, the question is probably referring to non-trainable parameters in Keras rather than non-trainable parameters in general (hyperparameters)

Non-trainable parameters in Keras are described in answer to this question.

...non-trainable parameters of a model are those that you will not be updating and optimized during training, and that have to be defined a priori, or passed as inputs.

The example of such parameters are:

the number of hidden layers
nodes on each hidden layer
the nodes on each individual layer
and others

These parameters are "non-trainable" because you can't optimize its value with your training data.

To address your questions:

Is it okay to have non-trainable params?

Yes, it is okay and in fact is inevitable if your are building an NN or some another machine learning model.

Does this create errors in the model?

It does not create an error by default, it determines the architecture of your neural network.

But some architectures will perform better for your data and task than others.

So if you will choose sub-optimal non-trainable parameters, you can and will underfit on your data

Optimizing non-trainable parameters is an whole another, quite broad topic.

Answer for a general machine learning theory:

Non-training parameters (not specifically for Keras) are called hyperparameters.

Their puprose is to adapt an algorithm to the specific requirements of yours.

For example if you are training a simple Logistic Regression, you have a parameter C, which stands for regularization, which basically influencing on how much you will "penalty" an algorithm for wrong answers.

You might want to penalty your algorithm very hard to generalize more (but you can underfit as well), or you might want to not penalty high for the mistakes (and this also can lead to overfitting)

This is the thing that you can't learn from the data - this is the thing that you can adjust to fit your particular need.

This answer neglects the possibility that some layers in the model can be frozen as would be the case in transfer learning. Those layer's weights are then, by definition, non-trainable parameters. In fact, Keras will show you how many parameters exactly you have that are trainable vs non-trainable. When Keras outputs "non-trainable parameters", it is not actually telling you how many hyperparameters your model has. It's referring to those frozen layers. — enumaris, Aug 20 '18 at 17:11
@enumaris thank you for pointing that out! I thought the question wasn't Keras and NN specific (because the question states "in machine learning" in general). But you are right, it is worth mentioning. I will update my answer soon to address your comment. — Teoretic, Aug 20 '18 at 17:28
Yeah, I'm doing some reading between the lines because the OP tagged Keras and Keras has a "Trainable Parameters vs Non-Trainable Parameters" output when you do a model.summary() so I feel like that's what the OP was asking about. — enumaris, Aug 20 '18 at 17:34
@enumaris I edited the question. Feel free to correct me if I'm wrong somewhere in a new answer. — Teoretic, Aug 20 '18 at 19:07
I think actually the second answer in that question you linked (the one with 5 upvotes), and not the accepted answer that you are referring to, is the answer that addresses the context I was talking about. But again, I'm reading between the lines here. It's entirely possible that OP did not mean the "non-trainable params" in Keras. — enumaris, Aug 20 '18 at 19:24
Hyperparameters are trainable. You just have to make sure to partition the data so you validate the grid-search on a holdout set, i.e. not overlapping the training data. That's all. It's still unclear what the OP's asking about. — smci, Sep 07 '18 at 21:15

blue_note · Answer 2 · 2018-09-10T10:00:55.463

Usually, non-trainable params does not mean some weights to which you assign random or predefined values (there would be no way to know the correct values, as they depend on the other weight values).

It means some architectural decisions that you have made a priori. For example, in a neural network , such parameters would be the number of layers and the number of nodes per layer. These can be decided by educated guesses, or, usually, by trial-and-error (cross-validation).

Having such non-trainable parameters is not only ok, but unavoidable with most training algorithms.

Is it okay to have non-trainable params in machine learning?

2 Answers2