3

I am having difficulty searching for documentation, studies, or blogs that can help me in building text sequence (features) classifier. The text sequence that I have contains logs of network.

I am building a GRU model using TensorFlow, with an SVM as the classification function. I am having trouble with the tensor shapes. It says ValueError: Shape must be rank 2 but is rank 3 for 'MatMul' (op: 'MatMul') with input shapes: [?,23,1], [512,2]. Here is a sample of the data I am using for training my neural network.

The goal of my project is to use this GRU-SVM model for intrusion detection on Kyoto University's honeypot system intrusion detection dataset. The dataset has 23 features, and a label (if there is an intrusion in the network or none).

import data
import numpy as np
import os
import tensorflow as tf


BATCH_SIZE = 200
CELLSIZE = 512
NLAYERS = 3
SVMC = 1
learning_rate = 0.01

TRAIN_PATH = '/home/darth/GitHub Projects/gru_svm/dataset/train/6'

def main():
    examples, labels, keys = data.input_pipeline(path=TRAIN_PATH, batch_size=BATCH_SIZE, num_epochs=1)

    seqlen = examples.shape[1]

    x = tf.placeholder(shape=[None, seqlen, 1], dtype=tf.float32)
    y = tf.placeholder(shape=[None, 2], dtype=tf.float32)
    Hin = tf.placeholder(shape=[None, CELLSIZE*NLAYERS], dtype=tf.float32)

    # cell = tf.contrib.rnn.GRUCell(CELLSIZE)
    network = []
    for index in range(NLAYERS):
        network.append(tf.contrib.rnn.GRUCell(CELLSIZE))

    mcell = tf.contrib.rnn.MultiRNNCell(network, state_is_tuple=False)
    Hr, H = tf.nn.dynamic_rnn(mcell, x, initial_state=Hin, dtype=tf.float32)

    Hf = tf.transpose(Hr, [1, 0, 2])
    last = tf.gather(Hf, int(Hf.get_shape()[0]) - 1)

    weight = tf.Variable(tf.truncated_normal([CELLSIZE, 2], stddev=0.01), tf.float32)
    bias = tf.Variable(tf.constant(0.1, shape=[2]))
    logits = tf.matmul(last, weight) + bias

    regularization_loss = 0.5 * tf.reduce_sum(tf.square(weight))
    hinge_loss = tf.reduce_sum(tf.maximum(tf.zeros([BATCH_SIZE, 1]), 1 - y * logits))
    loss = regularization_loss + SVMC * hinge_loss

    train_step = tf.train.GradientDescentOptimizer(learning_rate=learning_rate).minimize(loss)

    init_op = tf.group(tf.global_variables_initializer(), tf.local_variables_initializer())

    with tf.Session() as sess:
        sess.run(init_op)

        train_loss = 0

        coord = tf.train.Coordinator()
        threads = tf.train.start_queue_runners(coord=coord)

        try:
            for index in range(100):
                for j in range(1000):
                    example_batch, label_batch, key_batch = sess.run([examples, labels, keys])
                    _, train_loss_ = sess.run([train_step, loss],
                        feed_dict = { x : example_batch,
                                        y : label_batch,
                                        Hin : np.zeros([BATCH_SIZE, CELLSIZE * NLAYERS])
                                    })
                    train_loss += train_loss_
                print('[{}] loss : {}'.format(index, (train_loss / 1000)))
                train_loss = 0
        except tf.errors.OutOfRangeError:
            print('EOF reached.')
        except KeyboardInterrupt:
            print('Interrupted by user at {}'.format(index))
        finally:
            coord.request_stop()
        coord.join(threads)

main()

Note: The reason why I built my MultiRNNCell as I did (snippet isolated below) is because I was having an error similar to this post.

network = []
for index in range(NLAYERS):
    network.append(tf.contrib.rnn.GRUCell(CELLSIZE))

Thank you in advance for your response!

Update 08/01/2017 The source was improved based on @jdehesa's sugestions:

import data
import numpy as np
import os
import tensorflow as tf


BATCH_SIZE = 200
CELLSIZE = 512
NLAYERS = 3
SVMC = 1
learning_rate = 0.01

TRAIN_PATH = '/home/darth/GitHub Projects/gru_svm/dataset/train/6'

def main():
    examples, labels, keys = data.input_pipeline(path=TRAIN_PATH, batch_size=BATCH_SIZE, num_epochs=1)

    seqlen = examples.shape[1]

    x = tf.placeholder(shape=[None, seqlen, 1], dtype=tf.float32, name='x')
    y_input = tf.placeholder(shape=[None], dtype=tf.int32, name='y_input')
    y = tf.one_hot(y_input, 2, dtype=tf.float32, name='y')
    Hin = tf.placeholder(shape=[None, CELLSIZE*NLAYERS], dtype=tf.float32, name='Hin')

    network = []
    for index in range(NLAYERS):
        network.append(tf.contrib.rnn.GRUCell(CELLSIZE))

    mcell = tf.contrib.rnn.MultiRNNCell(network, state_is_tuple=False)
    Hr, H = tf.nn.dynamic_rnn(mcell, x, initial_state=Hin, dtype=tf.float32)

    Hf = tf.transpose(Hr, [1, 0, 2])
    last = tf.gather(Hf, int(Hf.get_shape()[0]) - 1)

    weight = tf.Variable(tf.truncated_normal([CELLSIZE, 2], stddev=0.01), tf.float32, name='weights')
    bias = tf.Variable(tf.constant(0.1, shape=[2]), name='bias')
    logits = tf.matmul(last, weight) + bias

    regularization_loss = 0.5 * tf.reduce_sum(tf.square(weight))
    hinge_loss = tf.reduce_sum(tf.maximum(tf.zeros([BATCH_SIZE, 1]), 1 - y * logits))
    loss = regularization_loss + SVMC * hinge_loss

    train_step = tf.train.GradientDescentOptimizer(learning_rate=learning_rate).minimize(loss)

    init_op = tf.group(tf.global_variables_initializer(), tf.local_variables_initializer())

    with tf.Session() as sess:
        sess.run(init_op)

        train_loss = 0

        coord = tf.train.Coordinator()
        threads = tf.train.start_queue_runners(coord=coord)

        try:
            for index in range(100):
                example_batch, label_batch, key_batch = sess.run([examples, labels, keys])
                _, train_loss_ = sess.run([train_step, loss],
                    feed_dict = { x : example_batch[..., np.newaxis],
                                    y_input : label_batch,
                                    Hin : np.zeros([BATCH_SIZE, CELLSIZE * NLAYERS])
                                })
                train_loss += train_loss_
                print('[{}] loss : {}'.format(index, (train_loss / 1000)))
                print('Weights : {}'.format(sess.run(weight)))
                print('Biases : {}'.format(sess.run(bias)))
                train_loss = 0
        except tf.errors.OutOfRangeError:
            print('EOF reached.')
        except KeyboardInterrupt:
            print('Interrupted by user at {}'.format(index))
        finally:
            coord.request_stop()
        coord.join(threads)

main()

My next move is to validate if the results I'm getting are correct.

afagarap
  • 650
  • 2
  • 10
  • 22
  • 1
    Thanks for the suggested edit. However, I had to reject it. See my post to see why. https://stackoverflow.com/a/45443606/2336654 – piRSquared Aug 01 '17 at 17:15

1 Answers1

1

The problem is in the line:

logits = tf.matmul(x, weight) + bias

I think what you meant was:

logits = tf.matmul(last, weight) + bias
jdehesa
  • 58,456
  • 7
  • 77
  • 121
  • I did that, but now, I have a new problem: `ValueError: Cannot feed value of shape (200, 23) for Tensor 'Placeholder:0', which has shape '(?, 23, 1)'` – afagarap Aug 01 '17 at 12:31
  • @AbienFredAgarap That's a different error (the first one was during the _construction_ of the graph and this one during its _execution_). Try passing `x : example_batch[..., np.newaxis]` in `feed_dict`. – jdehesa Aug 01 '17 at 12:35
  • Should I put that `x : example_batch[..., np.newaxis]` as it is? Because when I did, I got the same error. Sorry, just new to this. – afagarap Aug 01 '17 at 12:39
  • @AbienFredAgarap Yeah I mean when you do `feed_dict = ...`, inside that `dict`, replace `x : example_batch` with `x : example_batch[..., np.newaxis]` (and leave the rest as it is). You may get more errors, but you shouldn't get the same error. – jdehesa Aug 01 '17 at 12:41
  • Sorry, my bad. I was editing the wrong source. This is the new error I've got: `ValueError: Cannot feed value of shape (200,) for Tensor 'Placeholder_1:0', which has shape '(?, 2)'`. I'm really sorry. – afagarap Aug 01 '17 at 12:50
  • @AbienFredAgarap Okay but I cannot help you with that. Your input pipeline is providing a `label_batch` that is a vector of 200 elements but your model expects labels `y` that are matrices with X rows and two columns. I don't know if you have to [one-hot encode it](https://www.quora.com/What-is-one-hot-encoding-and-when-is-it-used-in-data-science) or something else, that depends on your data and application. – jdehesa Aug 01 '17 at 12:54
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/150723/discussion-between-abien-fred-agarap-and-jdehesa). – afagarap Aug 01 '17 at 12:55