How to create a multi-dimensional one-hot tensor

Question

I have a list of K (x_i, y_i) pairs where 0 <= x_i < X and 0 <= y_i < Y represented as a tensor of shape [K, 2].

I want to create a tensor T of shape [K, X, Y], where T[i, x, y] = 1 if x = x_i and y = y_i, 0 otherwise.

I know that for a list of indices I can use tf.one_hot, but not sure if I can reuse it here? something like tf.one_hot(pairs, depth=(X,Y))

muskrat · Answer 1 · 2018-01-11T03:43:03.510

From this SO post we get a slick way to do this in numpy:

(np.arange(a.max()) == a[...,None]-1).astype(int)

Fully using that trick, now we just have to port this to tensorflow:

# for the numpy, full credit to @Divakar and https://stackoverflow.com/questions/34987509/tensorflow-max-of-a-tensor-along-an-axis
print('first an awesome way to do it in numpy...')
a = np.array([[1,2,4],[3,1,0]])
print((np.arange(a.max()) == a[...,None]-1).astype(int))

# porting this to tensorflow...
print('\nnow in tensorflow...')
b = tf.constant([[1,2,4],[3,1,0]])
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    print(sess.run(tf.cast(tf.equal(tf.range(tf.reduce_max(b)),tf.reshape(b,[2,3,1])-1),tf.int32)))

Returns:

first an awesome way to do it in numpy...
[[[1 0 0 0]
  [0 1 0 0]
  [0 0 0 1]]

 [[0 0 1 0]
  [1 0 0 0]
  [0 0 0 0]]]

now in tensorflow...
[[[1 0 0 0]
  [0 1 0 0]
  [0 0 0 1]]

 [[0 0 1 0]
  [1 0 0 0]
  [0 0 0 0]]]

That was fun.

score 0 · Answer 2 · answered Oct 02 '18 at 08:11

I think the best solution uses tf.sparse_to_dense. For example, if we want ones in positions (6,2), (3,4), (4,5) of a 10x8 matrix:

indices = sorted([[6,2],[3,4],[4,5]])
one_hot_encoded = tf.sparse_to_dense(sparse_indices=indices, output_shape=[10,8], sparse_values=1)
with tf.Session() as session:
    tf.global_variables_initializer().run()
    print(one_hot_encoded.eval())

This returns the following:

[[0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 1. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 1. 0. 0. 0. 0.]
 [0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0.]]

Furthermore, the inputs (e.g. indices) might be a tf.Variable object, no need for it to be constant.

It has a couple of restrictions, namely indices must be sorted (hence the sorted above) and not repeated. You can also use tf.one_hot directly. In that case, you need the indices as two vectors of all the x before and all the y after, i.e. list(zip(*indices)). Then one can do:

new_indices = list(zip(*indices))
# one of the following: the first one is for xy index convention:
flat_indices = new_indices[1] * depth[1] + new_indices[0]
# this other for ij convention:
# flat_indices = new_indices[0] * depth[1] + new_indices[1]

# Apply tf.one_hot to the flattened vector, then sum along the newly created dimension
one_hot_flat = tf.reduce_sum(tf.one_hot(flat_indices, depth=np.prod(im_size)), axis=0)

# Finally reshape
one_hot_encoded = tf.reshape(oh, im_size)

with tf.Session() as session:
    tf.global_variables_initializer().run()
    print(one_hot_encoded.eval())

This returns the same as the above. However, indices don't need to be sorted, and they can be repeated (in which case, the corresponding entry will be the number of appearances; for a simple "1" everywhere, replace tf.reduce_sum with tf.reduce_max). Also this supports variables.

However, for large indices / depths, memory consumption may be a problem. It creates a temporary N x W x H tensor, where N is the number of indices tuples, and that might get problematic. Therefore, the first solution is probably preferable, when possible.

Actually, if one is okay with using sparse tensor, the most memory-efficient way is probably just:

sparse = tf.SparseTensor(indices=indices, values=[1]*len(indices), dense_shape=[10, 8])

When run, this returns a more cryptic:

SparseTensorValue(indices=array([[3, 4],
   [4, 5],
   [6, 2]]), values=array([1, 1, 1], dtype=int32), dense_shape=array([10,  8]))

How to create a multi-dimensional one-hot tensor

2 Answers2