Tensorflow DNNRegressor not learning

Question

I'm trying to build a DNNRegressor to learn from 196 features to predict 1 label, all real numbers.

I've tried multiple variations of feeding data & batches but nothing seems to work... the output from fit() stays in INFO:tensorflow:loss = 1.59605e+32 and when trying to predict the same training data, the output is way off the range of my label (which is between -1.7 to 2.6, but I get predictions like: 2.9873503e+09)

Can anyone help, what I'm doing wrong?

My code below:

import pandas as pd
import tensorflow as tf

df_train = pd.read_csv("...", delimiter="\t", index_col=0)
LABEL = 'y'
COLUMNS = list(df_train.columns.values)
COLUMNS = filter(lambda a: a != LABEL, COLUMNS)

def my_input_fn(df):
    continuous_cols = {k: tf.constant(df[k].values, shape=[df[k].size, 1]) for k in COLUMNS}
    labels = tf.constant(df[LABEL].values)
    return continuous_cols, labels

continuous_features = [tf.contrib.layers.real_valued_column(k) for k in COLUMNS]
regressor = tf.contrib.learn.DNNRegressor(feature_columns=continuous_features, hidden_units=[20,10], model_dir="...")
regressor.fit(input_fn=lambda: my_input_fn(df_train), steps=20000)
results = regressor.evaluate(input_fn=lambda: my_input_fn(df_test),steps=1)

I'm running tf with gpu support. One thing that I noticed is that when first calling the fit() function, I get:

E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 3.94G (4233691136 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 3.94G (4233691136 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 3.94G (4233297920 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY

But it still runs after that. Many thanks!

Update: I've noticed that some of the input columns are all zeroes. When I remove them, the network learns and converges. I've tried to input these columns as categorical columns (binary) but this also makes the learning not converge.

For your out of memory error see https://stackoverflow.com/questions/39465503/cuda-error-out-of-memory-in-tensorflow and https://stackoverflow.com/questions/34514324/error-using-tensorflow-with-gpu/34514932#34514932 — BoboDarph, Jul 06 '17 at 15:01

Tensorflow DNNRegressor not learning

0 Answers0