1

Similar to this question, I want to build a TF dataset from a list with each element of different sizes. However, unlike the linked question, I would like to generate the dataset from the output of tf.dynamic_partition, which outputs a list of tensors.

My setup:

import tensorflow as tf
D = tf.data.Dataset # shorthand notation

x = tf.range(9) # Array to be partitioned
p = tf.constant([1,0,2,0,0,0,2,2,1]) # Defines partitions

The dataset should thus have three elements, containing [1 3 4 5], [0 8], and [2 6 7], respectively.

The direct approach fails, as expected:

dataset = D.from_tensor_slices(tf.dynamic_partition(x,p,3))
iterator = dataset.make_one_shot_iterator()
next_element = iterator.get_next()
with tf.Session() as sess:
    nl = sess.run(next_element)

tensorflow.python.framework.errors_impl.InvalidArgumentError: Shapes of all inputs must match: values[0].shape = [4] != values[1].shape = [2]

Next thing I tried is an application of the solution of the linked question, applying from_generator:

dataset = D.from_generator(lambda: tf.dynamic_partition(x,p,3), tf.int32, output_shapes=[None])
iterator = dataset.make_one_shot_iterator()
next_element = iterator.get_next()
with tf.Session() as sess:
    nl = sess.run(next_element)

tensorflow.python.framework.errors_impl.InvalidArgumentError: exceptions.ValueError: setting an array element with a sequence.

How can I create a dataset with variable-sized items from the output of tf.dynamic_partition?

mikkola
  • 3,376
  • 1
  • 19
  • 41

1 Answers1

3

The from_generator doesn't work because it expects the generator function to yield numpy arrays and not tensors.

A way to solve your problem is to create one dataset for each element of the partition. In your case you partition the data into 3 groups, so you would create 3 dataset and combine them with tf.data.Dataset.concatenate():

x = tf.range(9)  # Array to be partitioned
p = tf.constant([1, 0, 2, 0, 0, 0, 2, 2, 1])  # Defines partitions

partition = tf.dynamic_partition(x, p, 3)

dataset = tf.data.Dataset.from_tensors(partition[0])
for i in range(1, 3):
    dataset_bis = tf.data.Dataset.from_tensors(partition[i])
    dataset = dataset.concatenate(dataset_bis)

iterator = dataset.make_one_shot_iterator()
next_element = iterator.get_next()


with tf.Session() as sess:
    for i in range(3):
        nl = sess.run(next_element)
        print(nl)
Olivier Moindrot
  • 27,908
  • 11
  • 92
  • 91
  • This works! Slightly besides the point, but I wanted to mention that this does seem to slow down severely if there are more than a few calls to `concatenate`. I tried increasing the first loop's range to `range(1,20)` and concatenated with `dataset_bis = tf.data.Dataset.from_tensors(partition[i%3])`. Feeding the data is then very, very slow indeed. Depending on how many partitions you have, might be feasible or not. – mikkola Feb 15 '18 at 06:38
  • @mikkola: yes, I guess it really depends on the details of your project. There might be a better solution specific to your dataset. – Olivier Moindrot Feb 20 '18 at 18:22