I need to develop autoencoder using tensorflow, when I am checking the documentation and tutorial I can see many example with image data and MNIST_data which is pre-processed numerical data.
Where as in my case the data is in text format
like,
uid orig_h orig_p trans_depth method host
======================================================================
5fg288 192.168.1.4 80 1 POST ex1.com
2fg888 192.168.1.3 80 2 GET ex2.com
So how can I convert these data to numerical format which accept by tensor flow. I couldn't find any example in tensor flow tutorial,
I am beginner in tensor-flow, please help.
Update
Based on the instruction below I have created word to vector mapping by referring the tutorial here
The input in pandas dataframe
host method orig_h orig_p trans_depth uid
0 ex1.com POST 192.168.1.4 80 1 5fg288
1 ex2.com GET 192.168.1.3 443 2 2fg888
And
Bag of word ---> ['5fg288', '2fg888', '80', 'GET', '443', '1', 'ex2.com', '192.168.1.4', '192.168.1.3', '2', 'ex1.com', 'POST']
Now for each cell in I have array of values like
192.168.1.4 ---> [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0]
ex1.com ---> [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0]
80 ----> [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
So How can I reshape this data to give tensor flow
should it be like
data = array([
[[0.0,...],[0.0,...],[0.0,...],[0.0,...],[0.0,...],[0.0,...]],
[[0.0,...],[0.0,...],[0.0,...],[0.0,...],[0.0,...],[0.0,...]]
])
That is each feature as an array of float, and there are 6 features in single sample. Is that possible,