Tensorflow - properly (natively?) handling data batching overlaps (mini-batches?) for multiple epochs

Question

EDIT3: You cannot do this natively and I marked the answer that said so. However, I posted an example solution in another answer below for those curious.

EDIT2: Simple code with issue replication below.

EDIT: This is not a question about how to queue/batch over multiple epochs in general which is what the duplicate/suggested post explains, I'm asking specifically how to get non-perfect batch sizes working properly. That post simply mentions that the "allow_smaller_final_batch=True" argument should account for this scenario, but does not seem to (as proven in my code below).

In my TF neural network, I am using tf.train.slice_input_producer and tf.train.batch to batch my data over epochs, which works flawlessly when my batch size is a perfect multiple of my number of samples.

Unfortunately if it's not, the last batch of an epoch trails over into the next epoch's (i.e. there is no true "epoch" division), which eventually means that every epoch is different. EXAMPLE:

2 Epochs * 12 samples = 24 total values, Batch_size = 5,

WHAT IS CORRECT:

Epoch 1: [5 items], [5 items], [2 items]

Epoch 2: [5 items], [5 items], [2 items]

WHAT IT'S ACTUALLY DOING:

Epoch 1: [5 items], [5 items], [5 items]

Epoch 2: [5 items], [4 items], [0 items: out of bounds]

Code that produces the above example (very similar to my NN implementation):

import tensorflow as tf
import numpy as np

batch_size = 5
epochs = 2
Data = list(range(12))
iterations = int(np.ceil(len(Data)/batch_size)*epochs)
sess = tf.InteractiveSession()

x1 = tf.train.slice_input_producer([Data], num_epochs=epochs)
x2 = tf.train.batch(x1, batch_size=batch_size, allow_smaller_final_batch=True)

sess.run(tf.global_variables_initializer())
sess.run(tf.local_variables_initializer())
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(sess=sess,coord=coord)

for i in range(iterations):
   temp_batch = sess.run(x2)
   print('\n' + str(temp_batch))
sess.close()

I know this is likely just a bi-product of how tf.train.slice_input_producer works and I can probably manually achieve/avoid this in various ways, but is there no way to natively distinguish the "end" of an epoch with slicing?

Possible duplicate of [TensorFlow: does tf.train.batch automatically load the next batch when the batch has finished training?](https://stackoverflow.com/questions/41673889/tensorflow-does-tf-train-batch-automatically-load-the-next-batch-when-the-batch) — YLJ, May 31 '17 at 17:07
Thanks for the response frankyjuang; the difference here is that I have the general queueing working just fine (which is more what that post is asking about), but I am not getting the overlap behavior working as described/implied in that post. — Wanna-be Coder, May 31 '17 at 17:33
I have now included code in the original post, please take a look, thanks =) — Wanna-be Coder, May 31 '17 at 18:37
after diving into some tf codes, I come up with the conclusion. please find it in my answer. — YLJ, Jun 01 '17 at 15:06

score 1 · Accepted Answer · answered Jun 01 '17 at 15:05

1

Unfortunately, there's no way to distinguish the end of every epoch in a native way. That's because the general usage doesn't require separating training process into epochs. For instance, fully_connected_preloaded.py.

If you want to do something at the end of every epoch, you have to manually take care of it. If not, instead of calculating the iterations by yourself and worrying any mistakes, you can use coord.should_stop() to handle it:

try:
    while not coord.should_stop():
        temp_batch = sess.run(x2)
        print('\n' + str(temp_batch))
except tf.errors.OutOfRangeError:
    print("Done training, epoch limit reached.")
finally:
    coord.request_stop()    # Ask the threads to stop.

coord.join(threads)    # Wait for threads to stop.

answered Jun 01 '17 at 15:05

YLJ

2,940
2
18
29

Thanks for the response; I figured this was going to end up being the case unfortunately. Easy enough to accommodate for, but sad none the less. Thanks for investigating that and the tip with `coord.should_stop()`, much appreciated. – Wanna-be Coder Jun 01 '17 at 15:21
using a try/catch instead of just fixing the math isn't a good approach. – Anton Codes Jun 01 '17 at 15:56
I went ahead and made a simple example that makes it work as intended if you were curious, posted it as an additional answer. – Wanna-be Coder Jun 01 '17 at 16:53
@wontonimo the try/catch method actually is the official example mentioned in [Threading and Queues](https://www.tensorflow.org/programmers_guide/threading_and_queues). – YLJ Jun 01 '17 at 17:22

score 1 · Answer 2 · answered Jun 01 '17 at 16:52

In case someone wants to know how to do this based on my simple example (not natively):

import tensorflow as tf
import numpy as np

batch_size = 5
epochs = 2
Data = list(range(12))
iter_epoch = int(np.ceil(len(Data)/batch_size))
iterations = (iter_epoch)*epochs
mini_size = len(Data) % batch_size

def make_nparray(constant):
    return(np.array([np.int32(constant)]))

sess = tf.InteractiveSession()

batch_ph = tf.placeholder(dtype=np.int32,shape=(1,))
x1 = tf.train.slice_input_producer([Data], num_epochs=epochs)
x2 = tf.train.batch(x1, batch_size=batch_ph[0])

sess.run(tf.global_variables_initializer())
sess.run(tf.local_variables_initializer())
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(sess=sess,coord=coord)

for i in range(iterations):
    not_mini = (i+1) % iter_epoch != 0
    if not_mini:
        temp_batch = sess.run(x2,feed_dict={batch_ph:make_nparray(batch_size)})
    else:
        temp_batch = sess.run(x2,feed_dict={batch_ph:make_nparray(mini_size)})
    print('\n' + str(temp_batch))
coord.request_stop()
sess.close()

score 0 · Answer 3 · answered Jun 01 '17 at 13:58

0

Iteration calculation is incorrect

This is what your iteration calculation should be

iterations = int(np.ceil(1.0*len(Data)/batch_size*epochs))

When I change that line in your code I get the following output

[ 2  5  6 10  3]
[ 9  4  7  8 11]
[ 1  0 10  6  2]
[3 8 9 0 5]
[ 1  4 11  7]

The calculation you had contains len(Data)/batch_size which is calculated as integer math and is already truncated. By multiplying it by 1.0 you force it to be a floating point and your math works.

answered Jun 01 '17 at 13:58

Anton Codes

3,663
1
19
28

Heya wontonimo, thanks for the reply! The output you are getting is still the same as what I get despite your changes, which is the problem I'm trying to point out. For example, using that same 'distribution' you generated, the output SHOULD HAVE been grouped like so [2 5 6 10 3], [9 4 7 8 11], [1 0] ||||END BATCH 1|||| [10 6 2 3 8], [9 0 5 1 4], [11 7]. Hope that makes sense. – Wanna-be Coder Jun 01 '17 at 14:18
ah yes, I thought your issue was the **out of bounds** error. The code change I suggested fixes that and runs the correct number of iterations **without** try/catch statements. – Anton Codes Jun 01 '17 at 15:54

score 0 · Answer 4 · answered Oct 26 '17 at 13:13

Like @Wanne-be Coder showed, you just need to use an integer placeholder to control the batch size yourself. The relevant parts are:

batch_size = tf.placeholder(tf.int32, [])
x2 = tf.train.batch(x1, batch_size=batch_size)
batch1 = sess.run(x2, feed_dict={batch_size: 5}) # 5 items in batch1
batch2 = sess.run(x2, feed_dict={batch_size: 5}) # 5 items in batch2
batch3 = sess.run(x2, feed_dict={batch_size: 2}) # 2 items in batch3

Tensorflow - properly (natively?) handling data batching overlaps (mini-batches?) for multiple epochs

4 Answers4

Iteration calculation is incorrect