Tensorflow - shuffle & split dataset of images and labels

Question

New with Tensorflow, I'm using neural networks to classify images. I've got a Tensor that contains images, of shape [N, 128, 128, 1] (N images 128x128 with 1 channel), and a Tensor of shape [N] that contains the labels of the images.

I want to shuffle it all and split it between training and testing tensors (let's say 80%-20%). I didn't find a way to 'zip' my tensors to associate each image with its label (in order to shuffle images and labels the same way). Is it possible ? If not, how can I achieve that shuffling/splitting job ?

Thanks for any help !

Do you want to feed the images through batched input placeholders? — Abhishek Bansal, Jun 12 '17 at 13:46
I don't think I need placeholders as I've loaded it with `tf.image.decode_png()` and evaluated it in a session (I can post the code if you want). But yes, I need to use batches for training. — MeanStreet, Jun 12 '17 at 13:50
Do you want something like this? https://stackoverflow.com/questions/34340489/tensorflow-read-images-with-labels — Abhishek Bansal, Jun 12 '17 at 13:57
It's close to what I've done yet. I guess that the batches given by the `tf.train.batch` function are shuffled. But where is the split between train & test in that code ? Thanks for your help — MeanStreet, Jun 12 '17 at 14:10
I'm actually wrong, the shuffling is done by the `slice_input_producer` — MeanStreet, Jun 12 '17 at 14:17

Oleksandr Khryplyvenko · Accepted Answer · 2017-08-10T17:03:02.977

Just use the same 'seed' keyword parameter value, say seed=8 in function tf.random_shuffle for both labels and data.

ipdb> my_data = tf.convert_to_tensor([[1,1], [2,2], [3,3], [4,4], 
[5,5], [6,6], [7,7], [8,8]])
ipdb> my_labels = tf.convert_to_tensor([1,2,3,4,5,6,7,8])
ipdb> sess.run(tf.random_shuffle(my_data, seed=8))
array([[5, 5],
   [3, 3],
   [1, 1],
   [7, 7],
   [2, 2],
   [8, 8],
   [4, 4],
   [6, 6]], dtype=int32)
ipdb> sess.run(tf.random_shuffle(my_labels, seed=8))
array([5, 3, 1, 7, 2, 8, 4, 6], dtype=int32)

EDIT: if you need random shuffling in runtime, where batches, say, will be shuffled randomly but differendly, you may use such a trick:

# each time shuffling pattern will be differend

# for now, it works
indicies = tf.random_shuffle(tf.range(8))
params = tf.convert_to_tensor([111, 222, 333, 444, 555, 666, 777, 888])
sess.run(tf.add(tf.gather(params, indicies), tf.gather(params, indicies) * 1000))
> array([555555, 444444, 666666, 222222, 111111, 888888, 333333, 777777], dtype=int32)

numbers consisting of the same digits show, that gather<-indicies take the same seed value

Tensorflow - shuffle & split dataset of images and labels

1 Answers1