How to do tensorflow segment_max in high dimension

Question

I want to be able to call tensorflow's tf.math.unsorted_segment_max on a data tensor that is of size [N, s, K]. N is the number of channels and K is the number of filters/feature maps. s is the size of one-channel data sample. I have segment_ids in the size of s. For example, let's say my sample size is s=6, and that I want to do a max over two elements (as if doing the usual max pooling, so on the second, s-dimension of the whole data tensor). Then my segment_ids equals to [0,0,1,1,2,2].

I tried running

tf.math.unsorted_segment_max(data, segment_ids, num_segments)

with extended 0 and 2 dimensions for the segment_ids, but since the segment ids are then repeated, the result is of course of size [3] instead of [N,3,K] as I would like.

So my question is, how to construct a proper segment_ids tensor, to achieve what I want? I.e. to have segment max done based on the original s-sized segment_ids tensor, but in each dimension separately?

Basically, going back to the example, given the 1D segment id list seg_id=[0,0,1,1,2,2], I would like to construct something like a segment_ids tensor for which:

segment_ids[i,:,j] = seg_id + num_segments*(i*K + j)

So that when calling the tf.math.(unsorted_)segment_max with this tensor as segment ids, I will get a result of size [N, 3, K], with the same effect as if one would run the segment_max for each data[x,:,y] separately and stack the results appropriately.

Any way of doing this is okay, as long as it works with tensorflow. I would guess a combination of tf.tile, tf.reshape or tf.concat should do the trick but I can't figure out how, in what order. Also, is there a more straightforward way to do it? Without the need of adjusting the segment_ids during each "pooling" step?

Can't you do this with the standard [`tf.nn.pool`](https://www.tensorflow.org/api_docs/python/tf/nn/pool) operation? (e.g. `tf.nn.pool(data, [2], 'MAX', 'VALID', strides=[2])`) Or the pooling was just an example, but in your problem you want to use unsorted segment ids? — jdehesa, Mar 08 '19 at 12:34
Yes, the pooling was just an example @jdehesa . I would need it for the sort of 'pooling' where the segments over which I do the max/min or sum are of unequal sizes and so on, that's why I don't really see a way to avoid the usage of tf.unsorted_segment_max/min/sum.... — Polhek, Mar 11 '19 at 15:03

jdehesa · Accepted Answer · 2019-03-11T15:10:21.023

I think you can achieve what you want with tf.nn.pool:

import tensorflow as tf

with tf.Graph().as_default(), tf.Session() as sess:
    data = tf.constant([
        [
            [ 1, 12, 13],
            [ 2, 11, 14],
            [ 3, 10, 15],
            [ 4,  9, 16],
            [ 5,  8, 17],
            [ 6,  7, 18],
        ],
        [
            [19, 30, 31],
            [20, 29, 32],
            [21, 28, 33],
            [22, 27, 34],
            [23, 26, 35],
            [24, 25, 36],
        ]], dtype=tf.int32)
    segments = tf.constant([0, 0, 1, 1, 2, 2], dtype=tf.int32)
    pool = tf.nn.pool(data, [2], 'MAX', 'VALID', strides=[2])
    print(sess.run(pool))

Output:

[[[ 2 12 14]
  [ 4 10 16]
  [ 6  8 18]]

 [[20 30 32]
  [22 28 34]
  [24 26 36]]]

If you really want to us tf.unsorted_segment_max, you can do it as you suggest in your own answer. Here is an equivalent formulation that avoids transposing and includes the final reshaping:

import tensorflow as tf

with tf.Graph().as_default(), tf.Session() as sess:
    data = ...
    segments = ...
    shape = tf.shape(data)
    n, k = shape[0], shape[2]
    m = tf.reduce_max(segments) + 1
    grid = tf.meshgrid(tf.range(n) * m * k,
                       segments * k,
                       tf.range(k), indexing='ij')
    segment_nd = tf.add_n(grid)
    segmented = tf.unsorted_segment_max(data, segment_nd, n * m * k)
    result = tf.reshape(segmented, [n, m, k])
    print(sess.run(result))
    # Same output

Both methods should work fine in a neural network in terms of back-propagation.

EDIT: In terms of performance, pooling seems to be more scalable than the segmented sum (as one would expect):

import tensorflow as tf
import numpy as np

def method_pool(data, window):
    return tf.nn.pool(data, [window], 'MAX', 'VALID', strides=[window])

def method_segment(data, window):
    shape = tf.shape(data)
    n, s, k = shape[0], shape[1], shape[2]
    segments = tf.range(s) // window
    m = tf.reduce_max(segments) + 1
    grid = tf.meshgrid(tf.range(n) * m * k,
                       segments * k,
                       tf.range(k), indexing='ij')
    segment_nd = tf.add_n(grid)
    segmented = tf.unsorted_segment_max(data, segment_nd, n * m * k)
    return tf.reshape(segmented, [n, m, k])

np.random.seed(100)
rand_data = np.random.rand(300, 500, 100)
window = 10
with tf.Graph().as_default(), tf.Session() as sess:
    data = tf.constant(rand_data, dtype=tf.float32)
    res_pool = method_pool(data, n)
    res_segment = method_segment(data, n)
    print(np.allclose(*sess.run([res_pool, res_segment])))
    # True
    %timeit sess.run(res_pool)
    # 2.56 ms ± 80.8 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
    %timeit sess.run(res_segment)
    # 514 ms ± 6.29 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Ahh, thank you for this! I think this tf.meshgrid version was exactly what I needed! I tried running the custom "pooling" layer that I implemented with the code in my answer (so a lot of reshaping and tiling), and it got stuck while building the comp. graph. I think while doing the constant vector to sum things with. I'll try with your meshgrid version instead, seems much better! Thanks! — Polhek, Mar 12 '19 at 09:01

score 0 · Answer 2 · answered Mar 08 '19 at 12:04

I haven't come up with any more elegant solutions, but at least I figured out how to do it with the combination of tile, reshape and transpose. I first (using the three mentioned operations, see the code below) construct a tensor of the same size as data, with repeated (but shifted) entries of the original seg_id vector in the tensor:

m = tf.reduce_max(seg_id) + 1
a = tf.constant([i*m for i in range(N*K) for j in range(s)])
b = tf.tile(seg_id, N*K)
#now reshape it:
segment_ids = tf.transpose(tf.reshape(a+b, shape=[N,K,s]), perm=[0,2,1])

With this it's possible to call the segment_max function directly:

result = tf.unsorted_segment_max(data=data, segment_ids=segment_ids, num_segments=m*N*K)

And it does what I want it too, except that the result is flattened and needs to be reshaped once again, if needed. Equivalently, you can reshape the original data tensor to a 1d one, and cal segment_max on it with a+b as segment_ids. And again reshape the final result, if needed.

This is what it feels like a long way round to the result... Is there any better way? I also don't know if the described way is good for being used inside a NN, during backprop... could there be issues with derivatives or the computation graph? Does anyone have a better idea on how to solve this?

How to do tensorflow segment_max in high dimension

2 Answers2