Thanks to Tensorflow multithreading image loading, I have this load data function which, given a csv file e.g. a training csv file it creates some data nodes;
34 def loadData(csvPath,shape, batchSize=10,batchCapacity=40,nThreads=16):
35 path, label = readCsv(csvPath)
36 labelOh = oneHot(idx=label)
37 pathRe = tf.reshape(path,[1])
38
39 # Define subgraph to take filename, read filename, decode and enqueue
40 image_bytes = tf.read_file(path)
41 decoded_img = tf.image.decode_jpeg(image_bytes)
42 decoded_img = prepImg(decoded_img,shape=shape)
43 imageQ = tf.FIFOQueue(128,[tf.float32,tf.float32,tf.string], shapes = [shape,[447],[1]])
44 enQ_op = imageQ.enqueue([decoded_img,labelOh,pathRe])
45
46 NUM_THREADS = nThreads
47 Q = tf.train.QueueRunner(
48 imageQ,
49 [enQ_op]*NUM_THREADS,
50 imageQ.close(),
51 imageQ.close(cancel_pending_enqueues=True)
52 )
53
54 tf.train.add_queue_runner(Q)
55 dQ = imageQ.dequeue()
56 X, Y, Ypaths = tf.train.batch(dQ, batch_size = batchSize, capacity = batchCapacity)
57 return X, Y, Ypaths
I then call it and have the standard model, loss, train subgraphs such as;
xTr, yTr, yPathsTr = loadData(trainCsvPath, *args)
yPredTr = model1(xTr,*args)
ce = ... # some loss function
learningRate = tf.placeholder(tf.float32)
trainStep = tf.train.AdamOptimizer(learningRate).minimize(ce)
I then proceed to train the weights in model. As I understand so far I don't need to have data fed into feed_dict
as it is already defined.
with tf.Session() as sess:
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(sess=sess,coord=coord)
while not coord.should_stop():
sess.run([trainStep],feed_dict={learningRate:lr})
My question is now;
What's the best way to incorporate a train/test process? I.e. once the threads have finished the train csv file they then read the test csv file and I run another session where I have something like;
xTe, yTe, yPathsTe = loadData(csvPathTe, *args)
yPredTe = model1(xTe,*args) #will this use the same weights as the trained network? Or am I defining another seperate subgraph?
ce = ... # redefined for yPredTe
while not coord.should_stop():
ce.eval() # return losses
which runs until the test csv file has finished.
How would I then rinse and repeat these steps (possibly shuffling my training set) for a set number of epochs? Should I have a csv queue as well?