2

I have a dataset that contains activity data of 1000 users. Since the activity of one user differs from another user, I want the user attribute also send to the LSTM RNN model so that the model can learn better about each user's behavior. The snippet of my dataset is as below:

https://i.stack.imgur.com/poL31.jpg

I tried with one-hot encoding and binary encoding of categorical information, but the model did not produce good results. But applying the LSTM RNN model on a single user's data (excluding user variable) produces good results.

The snippet of my lstm autoencoder model for anomaly detection is as below:

inputs = Input(shape = (timesteps, n_features))
L1 = LSTM(encoding_dim, activation='relu', return_sequences=True,
          kernel_regularizer=regularizers.l2(0.00))(inputs)
L2 = LSTM(hidden_dim, activation='relu', return_sequences=False)(L1)
L3 = RepeatVector(timesteps)(L2)
L4 = LSTM(hidden_dim, activation='relu', return_sequences=True)(L3)
L5 = LSTM(encoding_dim, activation='relu', return_sequences=True)(L4)
output = TimeDistributed(Dense(n_features))(L5)
lstm_model = Model(inputs=inputs, outputs=output)
lstm_model.summary()

For now I tried with,

n_features = 22; no. of features [ 1(categorical with one-hot encoding) + 21 (numerical)]

encoding_dim = 16

hidden_dim = 8

How can I better handle categorical attribute i.e. user variable with this model?

ab.sharma
  • 180
  • 3
  • 12

1 Answers1

0

I am sharing my code train_y and test_y are categories.

Load your dataset using the following: train_size = int(len(df) * (1 - test_split))

X_train = df['sequence'].values[:train_size]
y_train = np.array(df['target'].values[:train_size])
X_test = np.array(df['sequence'].values[train_size:])
y_test = np.array(df['target'].values[train_size:])

Then load the data and train it using: X_train, y_train, X_test, y_test = load_data()

model = create_model(len(X_train[0]))

print ('Fitting model...')
hist = model.fit(X_train, y_train, batch_size=64, nb_epoch=10, validation_split = 
0.1, verbose = 1)

score, acc = model.evaluate(X_test, y_test, batch_size=1)
print('Test score:', score)
akay
  • 27
  • 9