3

I've been using tensorflow for a while now. At first I had stuff like this:

def myModel(training):
    with tf.scope_variables('model', reuse=not training):
        do model
        return model

training_model = myModel(True)
validation_model = myModel(False)

Mostly because I started with some MOOCs that tought me to do that. But they also didn't use TFRecords or Queues. And I didn't know why I was using two separate models. I tried building only one and feeding the data with the feed_dict: everything worked.

Ever since I've been usually using only one model. My inputs are always place_holders and I just input either training or validation data.

Lately, I've noticed some weird behavior on models that use tf.layers.dropout and tf.layers.batch_normalization. Both functions have a 'training' parameter that I use with a tf.bool placeholder. I've seen tf.layers used generally with a tf.estimator.Estimator, but I'm not using it. I've read the Estimators code and it appears to create two different graphs for training and validation. May be that those issues are arising from not having two separate models, but I'm still skeptical.

Is there a clear reason I'm not seeing that implies that two separate-equivalent models have to be used?

Maxim
  • 52,561
  • 27
  • 155
  • 209
Andrés Marafioti
  • 819
  • 1
  • 7
  • 23

2 Answers2

6

You do not have to use two neural nets for training and validation. After all, as you noticed, tensorflow helps you having a monolothical train-and-validate net by allowing the training parameter of some layers to be a placeholder.

However, why wouldn't you? By having separate nets for training and for validation, you set yourself on the right path and future-proof your code. Your training and validation nets might be identical today, but you might later see some benefit to having distinct nets such as having different inputs, different outputs, removing out intermediate layers, etc.

Also, because variables are shared between them, having distinct training and validation nets comes at almost no penalty.

So, keeping a single net is fine; in my experience though, any project other than playful experimentation is likely to implement a distinct validation net at some point, and tensorflow makes it easy to do just that with minimal penalty.

P-Gn
  • 23,115
  • 9
  • 87
  • 104
  • Inputs and outputs can be very easily changed without making a new network. Removing out intermediate layers is not as easy, but so far every intermediate layer that I want to remove has a version in tensorflow that you can assign the training parameter. The problem for me doesn't come as a RAM penalty but a clarity one. I just don't find the added functionality worth the complexity (so far). – Andrés Marafioti Apr 06 '18 at 11:27
  • True, especially with the new `Dataset` input framework -- it used to be less easy with the former queue-based framework. Still, by keeping training and validation networks separate, I avoid having to worry about potential problems, such as polluting training EMA with validation data or anything else -- and I find that this comfort coming from simply handling two nets rather than one is cheap and easy to manage. But yes, this is mostly opinion-based. – P-Gn Apr 06 '18 at 12:34
  • I have to admit that I double checked batch normalizations and dropouts a couple of times before being sure about the training parameter functionality. I just don't trust tensorflow that much and whenever there is a problem I tend to think it's their fault first. – Andrés Marafioti Apr 06 '18 at 12:38
2

tf.estimator.Estimator classes indeed create a new graph for each invocation and this has been the subject of furious debates, see this issue on GitHub. Their approach is to build the graph from scratch on each train, evaluate and predict invocations and restore the model from the last checkpoint. There are clear downsides of this approach, for example:

  • A loop that calls train and evaluate will create two new graphs on every iteration.
  • One can't evaluate while training easily (though there are workarounds, train_and_evaluate, but this doesn't look very nice).

I tend to agree that having the same graph and model for all actions is convenient and I usually go with this solution. But in a lot of cases when using a high-level API like tf.estimator.Estimator, you don't deal with the graph and variables directly, so you shouldn't care how exactly the model is organized.

Maxim
  • 52,561
  • 27
  • 155
  • 209
  • My mention to the tf.estimator.Estimator was only there to say that I think the tensorflow developers expect you to do that. I don't use it as I didn't find any advantage from it. Also, building the graph and restoring the model seems like a huge overhead. Do you know if this is done in RAM or do they actually go to the HD on every iteration? – Andrés Marafioti Apr 06 '18 at 11:32
  • @AndrésMarafioti the checkpoints are stored on disk (you can locate them easily), which makes this lifecycle rather inefficient, but not so critical: each iteration still takes seconds to minutes, few extra millis aren't significant. – Maxim Apr 10 '18 at 14:15
  • I do use the checkpoints to continue training. Is an interesting change of mind if instead of a "huge overhead" it's just a few millis, will think of it. Do you know of any research evaluating this overhead? – Andrés Marafioti Apr 10 '18 at 14:37
  • @AndrésMarafioti there's nothing to research here, it all depends on the hardware and general I/O throughput. The task is to dump the model from memory to disk (SSD or HDD), modern SSD perform write ops within several millis nowadays. – Maxim Apr 10 '18 at 14:43