I write a simple program to try out reading data in batch functions in TensorFlow but encountered a problem:
I created 6 simple csv files; each file contains 3 records like:
1.0,1.0,1.0,1.0,1
1.1,1.1,1.1,1.1,1
1.2,1.2,1.2,1.2,1
(the first 4 columns are feature and the fifth column is the label.) so totally 6 files have 6*3=18 records.
I try to read the files into 3 batches of 6 records/batch using reader
, batch
or shuffle_batch
. When I don't specify num_epochs
in string_input_producer
the code works fine. But when I specify num_epochs
the batch
or shuffle_batch
always throws OutOfRange error
. The current_size
is always zero...
Here is the code:
import tensorflow as tf
import os
csvFiles = os.listdir('./data')
csvFiles = [i for i in csvFiles if i[-4:]=='.csv' ]
csvFiles = ['./data/'+i for i in csvFiles]
print(csvFiles)
fileQ = tf.train.string_input_producer(csvFiles,shuffle=False,num_epochs=3)
reader = tf.TextLineReader()
key,value = reader.read(fileQ)
record_defaults = [[0.0], [0.0], [0.0], [0.0], [0]]
col1, col2, col3, col4, label = tf.decode_csv(value, record_defaults=record_defaults)
feature = tf.stack([col1, col2, col3, col4])
feature_batch, label_batch = tf.train.shuffle_batch([feature, label], batch_size=6, capacity=100, min_after_dequeue=1) # num_threads=3,
with tf.Session() as sess:
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(sess=sess,coord=coord)
try:
for i in range(3):
featureBatch, labelBatch = sess.run([feature_batch, label_batch])
print(featureBatch)
print(labelBatch)
except tf.errors.OutOfRangeError:
print("Done reading!")
finally:
coord.request_stop()
coord.join(threads)
print("**END**")
the output of OutOfRange
error is here
please note the error was throwed when shuffle_batch
was first called. I think it means not a single record could be read.
and even I changed the code to just read one record it throwed the same error:
l,f=sess.run([label,feature])
This is a very simple code. Wonder what's wrong with it? Thank you very much!