Tensorflow does not learn

Question

The code:

import numpy as np
import tensorflow as tf
import pandas as pd
from sklearn.model_selection import train_test_split

x_data = np.linspace(0, 1000, 100000)
y_true = np.square(x_data)
y_true += np.random.randn(len(x_data))


feature_columns = [tf.feature_column.numeric_column('x', shape=[1])]
estimator = tf.estimator.DNNRegressor(feature_columns=feature_columns,         hidden_units=[2, 2], optimizer=lambda:
                                  tf.train.AdamOptimizer(
                                      learning_rate=0.001
                                  ))


X_train, X_test, y_train, y_test = train_test_split(x_data, y_true, test_size=0.3)

input_function = tf.estimator.inputs.numpy_input_fn({'x': X_train},y_train,
                                                batch_size=20, num_epochs=10000,
                                                shuffle=True)

train_input_function = tf.estimator.inputs.numpy_input_fn({'x': X_train},y_train,
                                                      batch_size=8, num_epochs=10000,
                                                      shuffle=False)
test_input_function = tf.estimator.inputs.numpy_input_fn({'x': X_test},y_test,
                                                     batch_size=8, num_epochs=10000,
                                                     shuffle=False)


estimator.train(input_fn=input_function, steps=1000)

train_metrics = estimator.evaluate(input_fn=train_input_function, steps=1000)
test_metrics = estimator.evaluate(input_fn=test_input_function, steps=1000)


print('TRAINING DATA METRICS')
print(train_metrics)
print()

print('TEST DATA METRICS')
print(test_metrics)
print()
###
new_data = np.linspace(0, 1000, 10)
input_function_predict = tf.estimator.inputs.numpy_input_fn({'x':new_data},     shuffle=False)
print(list(estimator.predict(input_fn=input_function_predict)))

Gives the following output:

TRAINING DATA METRICS
{'average_loss': 200498430000.0, 'label/mean': 332774.78, 'loss': 1603987400000.0, 'prediction/mean': 0.97833574, 'global_step': 1000}

TEST DATA METRICS
{'average_loss': 197508330000.0, 'label/mean': 332257.22, 'loss': 1580066700000.0, 'prediction/mean': 0.97833574, 'global_step': 1000}

[{'predictions': array([0.9783435], dtype=float32)}, {'predictions': array([0.9783435], dtype=float32)}, {'predictions': array([0.9783435], dtype=float32)}, {'predictions': array([0.9783435], dtype=float32)}, 
{'predictions': array([0.9783435],....

So to conclude, the loss is huge, because TF predict same value of Y for all X. What is wrong in the code?

The code won't run (on TF 1.8 or 1.9) since the given `optimizer` is not an `Optimizer`. Removing the `lambda:` solves that. — fuglede, Aug 18 '18 at 12:45
@fuglede Only "lambda:"? If yes - it's not working with error. If whole lambda expression, the loss is still huge and TF predict only one value — mikinoqwert, Aug 18 '18 at 12:55

fuglede · Answer 1 · 2018-08-18T14:36:31.860

2

The loss will always be a large number for inputs of that size, that scale on the response variable, and the given model.

What you've done actually works fine but will take ages to converge without further fine-tuning. In particular, if I

remove lambda: (cf. the comment above),
change the learning_rate to 0.1,
change batch_size to 20000,
change num_epochs to 100,

then your 10 predictions become

[-2.036557, 82379.797, 165955.28, 249530.75, 333106.22, 416681.72, 500257.19, 583832.63, 667408.13, 750983.63]

which, from a quick look, appears to be close to optimal for the given model (which doesn't appear to be a particularly good one):

With that, you're free to play around with the model. For instance, we know that a better model (if not very neural networky) would be the one defined through

estimator = tf.estimator.DNNRegressor(feature_columns=feature_columns,
                                      hidden_units=[1],
                                      activation_fn=np.square,
                                      optimizer=tf.train.AdamOptimizer(learning_rate=1))

At a final loss of 7.20825e+09, this provides perfect prediction:

Following the discussion in the comments below, in real life situations in which you want to include quadratic transformations in your model, you would typically include those as features; for instance, you could DNNRegressor to do linear regression (silly as that is) through

feature_columns = [tf.feature_column.numeric_column('x'),
                   tf.feature_column.numeric_column('x_squared')]

estimator = tf.estimator.DNNRegressor(feature_columns=feature_columns,
                                      hidden_units=[1],
                                      activation_fn=tf.identity,
                                      optimizer= tf.train.AdamOptimizer(learning_rate=1))

input_function = tf.estimator.inputs.numpy_input_fn({'x': X_train, 'x_squared': X_train**2}, y_train, 
                                                    batch_size=1000, num_epochs=500,
                                                    shuffle=True)

As before, this will give you a perfect fit

edited Aug 18 '18 at 14:36

answered Aug 18 '18 at 13:04

fuglede

17,388
2
54
99

Thanks. What other activation_fn are possible to put into estimator? I mean, their names in tf or something - the actual code not only names of them – mikinoqwert Aug 18 '18 at 13:55
For example you used np.square, and if i try to use np.tanh i get: AttributeError: 'Tensor' object has no attribute 'tanh' – mikinoqwert Aug 18 '18 at 14:01
@PanMikuś: You [can build anything you want](https://stackoverflow.com/q/39921607/5085211), but you can find a collection of the most commonly useful ones in [the docs](https://www.tensorflow.org/api_guides/python/nn#Activation_Functions). (And of course in practice, you probably wouldn't use `np.square` but rather include a new feature for the square of the old one, if that's what you would expect to work.) – fuglede Aug 18 '18 at 14:03
And this line: activation_fn=lambda x: x if x>0 else x*0.1 cause:TypeError: Using a `tf.Tensor` as a Python `bool` is not allowed. Use `if t is not None:` instead of `if t:` to test if a tensor is defined, and use TensorFlow ops such as tf.cond to execute subgraphs conditioned on the value of a tensor. Why? – mikinoqwert Aug 18 '18 at 14:04
In the first example above, we *do* use one from the docs (namely the default, `tf.nn.relu`), and as you can see, that one can leads to non-constant predictions when the optimization is set up properly. The same is true for many of the other ones (but of course, you are never going to get anything useful if you use one whose range doesn't match the range of the response, such as `tf.nn.sigmoid`). – fuglede Aug 18 '18 at 14:09
OK. I understand. So what function except np.square would you use to train the model of y = x**2 or so? – mikinoqwert Aug 18 '18 at 14:11
That's a chicken-or-the-egg sort of situation: When you say that you want to model `y ~ x**2`, you have no parameters at all, thus nothing to estimate. But, slightly more generally, if your model were something like `y ~ a*x**2 + b*x + c`, you would use linear regression on the predictors `x` and `x**2`; you don't need a `DNNRegressor` for that but could achieve the same effect by using the identity for your activation. – fuglede Aug 18 '18 at 14:15
I added an example of what that would look like to the answer. – fuglede Aug 18 '18 at 14:27

Tensorflow does not learn

1 Answers1