0

Thanks to Tensorflow multithreading image loading, I have this load data function which, given a csv file e.g. a training csv file it creates some data nodes;

 34 def loadData(csvPath,shape, batchSize=10,batchCapacity=40,nThreads=16):
 35     path, label = readCsv(csvPath)
 36     labelOh = oneHot(idx=label)
 37     pathRe = tf.reshape(path,[1])
 38     
 39     # Define subgraph to take filename, read filename, decode and enqueue
 40     image_bytes = tf.read_file(path)
 41     decoded_img = tf.image.decode_jpeg(image_bytes)
 42     decoded_img = prepImg(decoded_img,shape=shape)
 43     imageQ = tf.FIFOQueue(128,[tf.float32,tf.float32,tf.string], shapes = [shape,[447],[1]])
 44     enQ_op = imageQ.enqueue([decoded_img,labelOh,pathRe])
 45     
 46     NUM_THREADS = nThreads
 47     Q = tf.train.QueueRunner(
 48             imageQ,
 49             [enQ_op]*NUM_THREADS,
 50             imageQ.close(),
 51             imageQ.close(cancel_pending_enqueues=True)
 52             )
 53 
 54     tf.train.add_queue_runner(Q)
 55     dQ = imageQ.dequeue()
 56     X, Y, Ypaths = tf.train.batch(dQ, batch_size = batchSize, capacity = batchCapacity)
 57     return X, Y, Ypaths

I then call it and have the standard model, loss, train subgraphs such as;

xTr, yTr, yPathsTr = loadData(trainCsvPath, *args)
yPredTr = model1(xTr,*args)
ce = ... # some loss function
learningRate = tf.placeholder(tf.float32)
trainStep = tf.train.AdamOptimizer(learningRate).minimize(ce)

I then proceed to train the weights in model. As I understand so far I don't need to have data fed into feed_dict as it is already defined.

with tf.Session() as sess:
     coord = tf.train.Coordinator()
     threads = tf.train.start_queue_runners(sess=sess,coord=coord)
     while not coord.should_stop(): 
           sess.run([trainStep],feed_dict={learningRate:lr})

My question is now;

What's the best way to incorporate a train/test process? I.e. once the threads have finished the train csv file they then read the test csv file and I run another session where I have something like;

xTe, yTe, yPathsTe = loadData(csvPathTe, *args)
yPredTe = model1(xTe,*args) #will this use the same weights as the trained network? Or am I defining another seperate subgraph?
ce = ... # redefined for yPredTe
while not coord.should_stop(): 
      ce.eval() # return losses

which runs until the test csv file has finished.

How would I then rinse and repeat these steps (possibly shuffling my training set) for a set number of epochs? Should I have a csv queue as well?

Community
  • 1
  • 1
mattdns
  • 894
  • 1
  • 11
  • 26

1 Answers1

3

Alas, currently there is no good answer to this question. The typical evaluation workflow involves running a separate process that periodically does the following (e.g. evaluate() in cifar10_eval.py):

  1. Build a graph that includes an input pipeline that knows about the evaluation set, a copy of the model, the evaluation ops (if any), and a tf.train.Saver.
  2. Create a new session.
  3. Restore the latest checkpoint written by the training process in that session.
  4. Run the test op (e.g. ce in your question) and accumulate the results in Python, until you get a tf.errors.OutOfRangeError.

We're currently working on improved input pipelines that will make it easier to iterate over files many times, and reuse the same session.

mrry
  • 125,488
  • 26
  • 399
  • 400
  • Okay so I can just loop over two separate sessions (one train and one test) for the amount of epochs desired? – mattdns Dec 13 '16 at 22:02
  • That would work, yes. It might be easier to use a sequence of different sessions for test, so that you can use the `tf.errors.OutOfRangeError` to detect when you have processed the entire test set. – mrry Dec 13 '16 at 23:19
  • Do I have to init the last object? E.g. do I give the ```tf.errors.OutOfRangeError``` init argument a reader node like ```reader = tf.TextLineReader(skip_header_lines=1) ```? – mattdns Dec 16 '16 at 14:12
  • I'm not sure what you mean... I was suggesting that you wrap your eval code (in particular, the loop that calls `sess.run()`) in a `try:`/`except tf.errors.OutOfRangeError:` block. TensorFlow will raise this exception when the input has been exhausted. (There are also higher-level wrappers, such as [`tf.train.MonitoredSession`](https://www.tensorflow.org/api_docs/python/train/distributed_execution#MonitoredSession) that catch this exception for you.) – mrry Dec 16 '16 at 15:36
  • I can't seem to get the queue to throw that error it just loops back to the beginning I believe. Is there something you think I might be missing? – mattdns Jan 04 '17 at 15:34
  • 1
    The bug might be in the implementation of the `readCsv()` function. I was assuming it used something like `tf.string_input_producer(..., num_epochs=1)` but it might be defaulting to `num_epochs=None` (which means it will loop indefinitely). – mrry Jan 05 '17 at 05:47
  • On a larger csv file I think my threads are not coordinating and it just breaks as soon as one of them is finished maybe? e.g. on a csv of size 100 i get through the first 64 before it breaks (with 16 threads). – mattdns Jan 05 '17 at 11:57