I am trying to make Dataset that would provide batches of TFRecords wherein one batch there would be 2 random Records from one class and the rest from the other random classes.
OR
A Dataset of batches of where there would be 2 random Records from each class that fits into that batch.
I tried to do this with tf.data.Dataset.from_generator
and with tf.data.experimental.choose_from_datasets
but with no success. Do you have an idea on how to do this?
EDIT: Today i think i implemented the second variant. Here is the code i was testing it on.
def input_fn():
partial1 = tf.data.Dataset.from_tensor_slices(tf.range(0, 10)).repeat().shuffle(2)
partial2 = tf.data.Dataset.from_tensor_slices(tf.range(20, 30)).repeat().shuffle(2)
partial3 = tf.data.Dataset.from_tensor_slices(tf.range(60, 70)).repeat().shuffle(2)
l = [partial1, partial2, partial3]
def gen(x):
return tf.data.Dataset.range(x,x+1).repeat(2)
dataset = tf.data.Dataset.range(3).flat_map(gen).repeat(10)
choice = tf.data.experimental.choose_from_datasets(l, dataset).batch(4)
return choice
which when evaulated returns
[ 0 2 21 22]
[60 61 1 4]
[20 23 62 63]
[ 3 5 24 25]
[64 66 6 7]
[26 27 65 68]
[ 8 0 28 29]
[67 69 9 2]
[20 22 60 62]
[ 3 1 23 24]
[63 61 4 6]
[25 26 65 64]
[ 7 5 27 28]
[67 66 9 8]
[21 20 69 68]