Pad the last dimension of a tensor with different lengths to a specific length

Question

I have a similar question, this one TensorFlow - Pad unknown size tensor to a specific size?. My question is more difficult though, and I didn't find any solutions can solve my question. My question is that what if the given unknown tensor have different sizes in the last dimension and I want to pad them to the same fix length, how can I do that? For example, suppose the given tensor is

[[1],
 [1, 2],
 [1, 2, 3]]

I want to pad them such that I can get

[[1, 0, 0, 0],
 [1, 2, 0, 0],
 [1, 2, 3, 0]]

The solutions in the original post all assume the last dimension have the same length. Any ideas on how to solve this problem? I am not even sure if tf.pad() is the right function to achieve this...

tensor can't have different dimensions, maybe you meant ragged tensor? — Alberto Sinigaglia, Jul 30 '22 at 00:38

score 1 · Answer 1 · answered Jul 30 '22 at 13:03

1

Have a look at pad_sequences

It works as follow:

sequence = [
 [1],
 [1, 2],
 [1, 2, 3]
]

tf.keras.preprocessing.sequence.pad_sequences(sequence, padding='post')

Should give you:

array([
  [1, 0, 0]
  [1, 2, 0]
  [1, 2, 3]
])

answered Jul 30 '22 at 13:03

V. Guichard

21
2

Is it possible to pad to a fixed length instead of a fixed length? – Francis Jul 31 '22 at 01:45
@Francis I have not tried it but according to the documentation there is a parameter `max_len` and the return shape is supposed to be `(len(sequences), max_len) `. – V. Guichard Aug 01 '22 at 07:10

score 1 · Answer 2 · answered Aug 01 '22 at 00:50

Try combining tf.slice、tf.pad and tf.map_fn.

For TF1

"""
[
    [1],
    [1, 2],
    [1, 2, 3]
]
"""
a = tf.sparse.SparseTensor(
    indices=[[0,0], [1,0], [1,1], [2,0], [2,1], [2,2]],
    values=[1, 1, 2, 1, 2, 3],
    dense_shape=[3, 3],
)

def cut_or_pad_1d(lst, max_len):
    origin_len = tf.shape(lst)[0]
    # cut
    lst = tf.cond(origin_len > max_len,
                  true_fn=lambda: lst[:max_len],
                  false_fn=lambda: lst)
    # pad
    lst = tf.cond(origin_len < max_len,
                  true_fn=lambda: tf.pad(lst, [[0, max_len-origin_len]]),
                  false_fn=lambda: lst)
    return lst
    
sess = tf.Session()
a_dense = tf.sparse.to_dense(a)
import functools
for MAX_LEN in (2, 5):
    a_regularized = tf.map_fn(functools.partial(cut_or_pad_1d, max_len=MAX_LEN), a_dense)
    a_regularized_val = sess.run(a_regularized)
    print(f'max_len={MAX_LEN}, a_regularized_val=')
    print(a_regularized_val)

For TF2

"""
[
    [1],
    [1, 2],
    [1, 2, 3]
]
"""
a = tf.sparse.SparseTensor(
    indices=[[0,0], [1,0], [1,1], [2,0], [2,1], [2,2]],
    values=[1, 1, 2, 1, 2, 3],
    dense_shape=[3, 3],
)

def cut_or_pad_1d(lst, max_len):
    origin_len = tf.shape(lst)[0]
    if origin_len > max_len:
        # cut
        lst = lst[:max_len]
    elif origin_len < max_len:
        # pad
        lst = tf.pad(lst, [[0, max_len-origin_len]])
    return lst
    
a_dense = tf.sparse.to_dense(a)
import functools
for MAX_LEN in (2, 5):
    a_regularized = tf.map_fn(functools.partial(cut_or_pad_1d, max_len=MAX_LEN), a_dense)    
    print(f'max_len={MAX_LEN}, a_regularized_val=')
    print(a_regularized.numpy())

score 1 · Accepted Answer · answered Aug 01 '22 at 06:14

The simplest solution would be to call to_tensor() on your ragged tensor. It will automatically add padding:

import tensorflow as tf

x = tf.ragged.constant([[1], [1, 2], [1, 2, 3]])
x = x.to_tensor()
print(x)

tf.Tensor(
[[1 0 0]
 [1 2 0]
 [1 2 3]], shape=(3, 3), dtype=int32)

If you want to, for example, to pad to the length 10 instead of the default 3, try:

import tensorflow as tf

x = tf.ragged.constant([[1], [1, 2], [1, 2, 3]])
sequence_length = 10
x = x.to_tensor(shape=(x.bounding_shape()[0], sequence_length))
print(x)

tf.Tensor(
[[1 0 0 0 0 0 0 0 0 0]
 [1 2 0 0 0 0 0 0 0 0]
 [1 2 3 0 0 0 0 0 0 0]], shape=(3, 10), dtype=int32)

Is it possible to pre-pad the tensor with the beginning values? In the tensor above, it would look like [[1 1 1][1 1 2][1 2 3]]. — d84_n1nj4, Sep 29 '22 at 20:22

Pad the last dimension of a tensor with different lengths to a specific length

3 Answers3

Linked