0

Say I have 1000x500 table, where 500 are the columns and 1000 rows.

And the rows represent 1000 sample, each sample is composed of 499 features and 1 label

If I want to put this tensorflow model and, say that each time I get a batch of 20 samples:

.........................................
inputdata #is filled and has a shape of 499x1000
inputlabel #is filled and has a shape of 1x1000

y_ = tf.placeholder(tf.float32,[None,batchSize],name='Labels')
for j in range( numberOfRows/BatchSize):
  sess.run(train_step,feed_dict={x:batch_xs[j],y_:np.reshape(inputlabel[j] ,(batchSize,1) )}))

So I've been trying to run my code for two days without any success, So I'll be grateful for any help considering the y_ and reshaping part. The problem that I have is to understand, that when I read a batch of 20 data row how should I shape the labels Y_

Community
  • 1
  • 1
Engine
  • 5,360
  • 18
  • 84
  • 162

1 Answers1

2

First issue: put your batch_size dimension as your first dimension, that's the standard and a fair number of computations in tensorflow assume as much.

Second, I don't see a placeholder for your data, X. But your passing it as a variable to sess.run.

To keep things simple, I suggest you do all this reshaping outside of tensorflow, use numpy. Don't get me wrong, you can absolutely do this in tensorflow, but if slicing and merging are confusing you (they confused everyone the first time), tensorflow will only add to that confusion because you can't simply print the results of a slicing operation as conveniently in tensorflow as you can in numpy to debug your situation.

So to that end, let's do it:

# your data
mydata = np.random.rand(500,1000)

# tensorflow placeholders
X = tf.placeholder(tf.float32, [batchSize, 499], name='X')
y_ = tf.placeholder(tf.float32, [batchSize, 1], name=y_')

# let's transpose your data so the batch is the first dimension (1000 x 500)
mydata = mydata.T

# Let's split the labels from the data
data = mydata[:,0:499]
labels = mydata[:,500]

# Now train
for j in range(numOfRows/BatchSize):
   row_from = j * BatchSize
   row_to = j * BatchSize + BatchSize
   sess.run(train_step, feed_dict={
              x  : data[row_from:row_to, :]
              y_ : labels[row_from:row_to]
   })
  • Don't forget to permute your data, we didn't do it here. I personally like np.random.permutation(1000) to get a random list of indexes, then just take the first BatchSize indexes and then np.roll the random permutation, super easy way to iterate through data sets without dealing with computing indexes or the trailing batch that isn't an even size.
David Parks
  • 30,789
  • 47
  • 185
  • 328
  • thanks for you answer but I still miss the number of classes say 11 classes where do tell the trainer how many classes you have ? – Engine Apr 10 '17 at 21:12
  • Ah! My answer assumed a binary class. If you have more than 2 classes you'll need to change your y_ labels to a one-hot encoding. So class 3 could be represented by the vector `[0 0 1 0 0 0 0 0 0 0 0]`. For example, change your labels from class `3` to `[0 0 1 0 0 0 0 0 0 0 0]`, and likewise for other classes. Then your label data will have dimensionality 11. See this question: http://stackoverflow.com/questions/29831489/numpy-1-hot-array – David Parks Apr 10 '17 at 23:26
  • thanks for your help, but where should I use the numberOfclasses in placeholder for ex. Y_ ? – Engine Apr 11 '17 at 05:39
  • Your `y_` placeholder should look like this with 11 classes: `y_ = tf.placeholder(tf.float32, [batchSize, 11], name='y_'`, your inputs will of course match those dimensions. You of course need to convert your labels into the one-hot format as described in the link in my previous comment. – David Parks Apr 11 '17 at 16:11