2

K.learning_phase() fetches the value, not the tensor itself. I need the learning phase tensor to feed to K.function to get layer gradients, outputs, etc. Works fine w/ import keras.backend as K, but fails for import tensorflow.keras.backend as K. Relevant Git w/ partial workaround

How can I fetch the tensor itself?


Reproducible example:

import tensorflow.keras.backend as K
from tensorflow.keras.layers import Input, Dense
from tensorflow.keras.models import Model
import numpy as np

ipt = Input((16,))
out = Dense(16)(ipt)
model = Model(ipt, out)
model.compile('adam', 'mse')

x = np.random.randn(32, 16)
model.train_on_batch(x, x)

grads = model.optimizer.get_gradients(model.total_loss, model.layers[-1].output)
grads_fn = K.function(inputs=[model.inputs[0], model._feed_targets[0], K.learning_phase()], 
                      outputs=grads)

Full error trace:

File "<ipython-input-2-7f74922d7492>", line 3, in <module>
  outputs=grads)
File "D:\Anaconda\envs\tf2_env\lib\site-packages\tensorflow_core\python\keras\backend.py", line 3773, in function
  return EagerExecutionFunction(inputs, outputs, updates=updates, name=name)
File "D:\Anaconda\envs\tf2_env\lib\site-packages\tensorflow_core\python\keras\backend.py", line 3670, in __init__
  base_graph=source_graph)
File "D:\Anaconda\envs\tf2_env\lib\site-packages\tensorflow_core\python\eager\lift_to_graph.py", line 249, in lift_to_graph
  visited_ops = set([x.op for x in sources])
File "D:\Anaconda\envs\tf2_env\lib\site-packages\tensorflow_core\python\eager\lift_to_graph.py", line 249, in <listcomp>
  visited_ops = set([x.op for x in sources])

AttributeError: 'int' object has no attribute 'op'
today
  • 32,602
  • 8
  • 95
  • 115
OverLordGoldDragon
  • 1
  • 9
  • 53
  • 101

1 Answers1

1

As a (not-so-nice) workaround, you can use symbolic_learning_phase() from tensorflow.python.keras.backend:

from tensorflow.python.keras import backend

# ...
grads_fn = K.function(inputs=[model.inputs[0],
                              model._feed_targets[0],
                              backend.symbolic_learning_phase()], 
                      outputs=grads)

g_learning = grads_fn([x, x, True])
g_not_learning = grads_fn([x, x, False])

I am not sure why this function, unlike learning_phase(), has not been exported into tensorflow.keras.backend. Maybe there is a good reason for not doing so.

Further, note that using learning phase here only makes sense when your model contains some layers/ops which behave differently in training and inference modes (e.g. dropout). Otherwise, the output of the function would be the same.


Update: backend.symbolic_learning_phase() is used in tensorflow.keras code (example), suggesting not much against its public use. It serves as a drop-in replacement for K.learning_phase() in Eager execution, to be used in K.function().

OverLordGoldDragon
  • 1
  • 9
  • 53
  • 101
today
  • 32,602
  • 8
  • 95
  • 115
  • Err... This is exactly what I tried before posting, and `backend.symbolic_learning_phase()` failed - now it doesn't fail. I don't know what the deal was, but this seems to work - I'll run some checks before accepting answer. In the meantime, maybe you could shed light [here](https://github.com/tensorflow/tensorflow/issues/34508) also? I could open an SO Q&A if it's preferable – OverLordGoldDragon Nov 27 '19 at 06:23
  • Ah, you used tensorflow _python_ keras backend - though, you shouldn't really [dig there](https://stackoverflow.com/questions/58279628/what-is-the-difference-between-tf-keras-and-tf-python-keras); unsure how reliable this approach is – OverLordGoldDragon Nov 27 '19 at 06:45
  • @OverLordGoldDragon That's why I mentioned at the very beginning that it's not a *nice* workaround: it's using `tensorflow.python.keras`. And as I mentioned at the end of my answer, you can't use `symbolic_learning_phase()` through `tensorflow.keras.backend` because it's not exported into that module (unlike `learning_phase()`, `learning_phase_scope()` or `eager_learning_phase_scope()` which are all exported, but are not applicable in this scenario). I am not sure if this is intentional or not. – today Nov 27 '19 at 13:35
  • Indeed, it lacks an export - and being `tf.python.keras`, it's a bit more than 'not nice' - hence I'll keep the question open; fair find though, I'll share on the Git thread – OverLordGoldDragon Nov 27 '19 at 13:58
  • `K.learning_phase()` still evaluates to `int` in Eager; any 'nicer' workaround you know of for TF2.2? – OverLordGoldDragon May 19 '20 at 08:47
  • @OverLordGoldDragon No, unfortunately I am not aware of a better workaround for the moment. By the way, does your model have a layer/op (e.g. dropout) which behaves differently in training and inference phases (so that you must set the learning phase)? – today May 19 '20 at 12:31
  • Yes, but I don't suppose we can set flags on layer-level without further hacking – OverLordGoldDragon May 19 '20 at 21:31
  • I suppose it's [not needed](https://stackoverflow.com/questions/61887944/how-to-get-gradients-in-tf-2-2-eager/#answer-61952452) anymore, though it'd be nicer if `K.function` was as flexible as it used to be. I'll mark this resolved if don't come across a use case that requires `learning_phase` in a while. – OverLordGoldDragon May 22 '20 at 10:05
  • @OverLordGoldDragon All right, that sounds more or less good to me. However, basically, there should be a way for taking gradient of a tensor (whether it's output or weight of a layer) with respect to any other tensor in eager mode (that was possible with `K.function` in native Keras/TF graph mode). And also the thing which I intended to add to my previous comment and possibly my answer (which I didn't!) was the use of `training` argument for models/layers (e.g. see [this answer](https://stackoverflow.com/a/57509119/2099607) and its linked answers). Feel free to post an answer and accept it. – today May 22 '20 at 10:21
  • My planned approach for fetching `layer.output` gradients is based on [this answer](https://stackoverflow.com/a/56567364/10133797), enclosing `watch_layer` in a `try-finally` to clean up `layer.result` and revert `layer.call` to original; my concern is with the memory use of this method for caching `.result` in large models. If you have a better approach, I could open a Q&A for it. -- As an aside, keeping an API up-to-date with a multi-backend framework (Eager, Graph, TF 2 & 1, `keras`, `tf.keras`) is quite a nuisance. – OverLordGoldDragon May 22 '20 at 10:36
  • It seems we both wrongly suspected `symbolic_learning_phase()` to be unnice; `tf.keras` code uses it extensively, and I got it working fine to fetch layer outputs via `K.function`. Worth noting, though, is not to mix backends; `from keras` and `from tensorflow.keras` and `from tensorflow.python.keras` backends should only be used with themselves. My use is to import lattermost as `Kpy`, and `tf.keras.backend` as `K` - so: `Kpy.function(..., Kpy.symbolic_learning_phase())`, then e.g. `K.eval()`. If you integrate this information into your answer, I'll accept it. – OverLordGoldDragon May 23 '20 at 15:47
  • @OverLordGoldDragon All right, thank you for the information. Please feel free to edit my answer to improve it, or alternatively write a new answer and accept it; either of them are fine to me. Although, I am not sure about the point you mentioned regarding `tf.keras.backend` and `tf.python.keras.backend`: I might be wrong, but the former is just a subset of exported functions from the latter; so mixing them seems okay to me as long as the used functions are stable (especially in `tf.python.keras`). However, `keras` and `tf.keras` should never be mixed at all in any case. – today May 23 '20 at 16:09
  • My guess was there's a bit more to it, but you're right; former is a subset of latter. – OverLordGoldDragon May 23 '20 at 16:23