Having a dataset of monthly activity of users, segment to country and browser. each row is 1 day of user activity summed up and a score for that daily activity. For example: number of sessions per day is one feature. The score is a floating point number calculated from that daily features.
My goal is to try and predict the "average user" score at the end of the month using just 2 days of users data.
I have 25 month of data, some are full and some have only partial of the total days, in order to have a fixed batch size I've padded the sequences like so:
from keras.preprocessing.sequence import pad_sequences
padded_sequences = pad_sequences(sequences, maxlen=None, dtype='float64', padding='pre', truncating='post', value=-10.)
so sequences with less then the max where padded with -10 rows.
I've decided to create an LSTM model to digest the data, so at the end of each batch the model should predict the average user score. Then later I'll try to predict using just 2 days sample.
My Model look like that:
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dropout,Dense,Masking
from tensorflow.keras import metrics
from tensorflow.keras.callbacks import TensorBoard
from tensorflow.keras.optimizers import Adam
import datetime, os
model = Sequential()
opt = Adam(learning_rate=0.0001, clipnorm=1)
num_samples = train_x.shape[1]
num_features = train_x.shape[2]
model.add(Masking(mask_value=-10., input_shape=(num_samples, num_features)))
model.add(LSTM(64, return_sequences=True, activation='relu'))
model.add(Dropout(0.3))
#this is the last LSTM layer, use return_sequences=False
model.add(LSTM(64, return_sequences=False, stateful=False, activation='relu'))
model.add(Dropout(0.3))
model.add(Dense(1))
model.compile(loss='mse', optimizer='adam' ,metrics=['acc',metrics.mean_squared_error])
logdir = os.path.join(logs_base_dir, datetime.datetime.now().strftime("%Y%m%d-%H%M%S"))
tensorboard_callback = TensorBoard(log_dir=logdir, update_freq=1)
model.summary()
Model: "sequential_13"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
masking_5 (Masking) (None, 4283, 16) 0
_________________________________________________________________
lstm_20 (LSTM) (None, 4283, 64) 20736
_________________________________________________________________
dropout_14 (Dropout) (None, 4283, 64) 0
_________________________________________________________________
lstm_21 (LSTM) (None, 64) 33024
_________________________________________________________________
dropout_15 (Dropout) (None, 64) 0
_________________________________________________________________
dense_9 (Dense) (None, 1) 65
=================================================================
Total params: 53,825
Trainable params: 53,825
Non-trainable params: 0
_________________________________________________________________
While training I get NaN value on the 19th epoch
Epoch 16/1000
16/16 [==============================] - 14s 855ms/sample - loss: 298.8135 - acc: 0.0000e+00 - mean_squared_error: 298.8135 - val_loss: 220.7307 - val_acc: 0.0000e+00 - val_mean_squared_error: 220.7307
Epoch 17/1000
16/16 [==============================] - 14s 846ms/sample - loss: 290.3051 - acc: 0.0000e+00 - mean_squared_error: 290.3051 - val_loss: 205.3393 - val_acc: 0.0000e+00 - val_mean_squared_error: 205.3393
Epoch 18/1000
16/16 [==============================] - 14s 869ms/sample - loss: 272.1889 - acc: 0.0000e+00 - mean_squared_error: 272.1889 - val_loss: nan - val_acc: 0.0000e+00 - val_mean_squared_error: nan
Epoch 19/1000
16/16 [==============================] - 14s 852ms/sample - loss: nan - acc: 0.0000e+00 - mean_squared_error: nan - val_loss: nan - val_acc: 0.0000e+00 - val_mean_squared_error: nan
Epoch 20/1000
16/16 [==============================] - 14s 856ms/sample - loss: nan - acc: 0.0000e+00 - mean_squared_error: nan - val_loss: nan - val_acc: 0.0000e+00 - val_mean_squared_error: nan
Epoch 21/1000
I tried to apply the methods described here with no real success.
Update: I've changed my activation from relu to tanh and it solved the NaN issue. However it seems that the accuracy of my model stays 0 while the loss goes down
Epoch 100/1000
16/16 [==============================] - 14s 869ms/sample - loss: 22.8179 - acc: 0.0000e+00 - mean_squared_error: 22.8179 - val_loss: 11.7422 - val_acc: 0.0000e+00 - val_mean_squared_error: 11.7422
Q: What am I doing wrong here?