7

As the following piece of code shows, the tensorflow tf.nn.dilation2D function doesn't behave as a conventional dilation operator.

import tensorflow as tf
tf.InteractiveSession()
A = [[0, 0, 0, 0, 0, 0, 0],
     [0, 0, 0, 0, 1, 0, 0],
     [0, 0, 0, 1, 1, 1, 0],
     [0, 0, 0, 0, 1, 0, 0],
     [0, 0, 0, 0, 0, 0, 0],
     [0, 0, 0, 0, 0, 0, 0]]
kernel = tf.ones((3,3,1))
input4D = tf.cast(tf.expand_dims(tf.expand_dims(A, -1), 0), tf.float32)
output4D = tf.nn.dilation2d(input4D, filter=kernel, strides=(1,1,1,1), rates=(1,1,1,1), padding="SAME")
print(tf.cast(output4D[0,:,:,0], tf.int32).eval())

Returns the following tensor:

array([[1, 1, 1, 2, 2, 2, 1],
       [1, 1, 2, 2, 2, 2, 2],
       [1, 1, 2, 2, 2, 2, 2],
       [1, 1, 2, 2, 2, 2, 2],
       [1, 1, 1, 2, 2, 2, 1],
       [1, 1, 1, 1, 1, 1, 1]], dtype=int32)

I don't understand neither why it behaves like that, neither how I should use tf.nn.dilation2d to retrieve the expected output:

array([[0, 0, 0, 1, 1, 1, 0],
       [0, 0, 1, 1, 1, 1, 1],
       [0, 0, 1, 1, 1, 1, 1],
       [0, 0, 1, 1, 1, 1, 1],
       [0, 0, 0, 1, 1, 1, 0],
       [0, 0, 0, 0, 0, 0, 0]], dtype=int32)

Can someone enlighten the succinct documentation of tensorflow and give an explanation of what the the tf.nn.dilation2D function does ?

Cris Luengo
  • 55,762
  • 10
  • 62
  • 120
Jav
  • 1,445
  • 1
  • 18
  • 47
  • @CrisLuengo, I tried reproducing the behavior by executing the first code box in this posting. I get "AttributeError: module 'tensorflow' has no attribute 'InteractiveSession'". When I remove the "tf.InteractiveSession", I get "TypeError: dilation2d_v2() got an unexpected keyword argument 'filter'". Could you please post a version of this code that works for tensorflow v2.4.1 ? – Mark Lavin Feb 15 '21 at 15:23
  • @Mark, this is not my code. I’ve never used tensorflow for image processing. – Cris Luengo Feb 15 '21 at 15:24
  • Thanks, Cris. @Jav, any comments? – Mark Lavin Feb 15 '21 at 15:52
  • This question was asked 2 years ago, thus is for tf1.14. Read tf2 documentation if you want to convert tf1 code to tf2. – Jav Feb 15 '21 at 16:47
  • It's not relevent to update the question to tensorflow2 since the behavior might have changed since then. – Jav Feb 15 '21 at 16:48

2 Answers2

9

As mentioned in the documentation page linked,

Computes the grayscale dilation of 4-D input and 3-D filter tensors.

and

In detail, the grayscale morphological 2-D dilation is the max-sum correlation [...]

What this means is that the kernel's values are added to the image's values at each position, then the maximum value is taken as the output value.

Compare this to correlation, replacing the multiplication with an addition, and the integral (or sum) with the maximum:

      convolution: g(t) = ∫ f() h(-t) d

      dilation: g(t) = max { f() + h(-t) }

Or in the discrete world:

      convolution: g[n] = ∑k f[k] h[k-n]

      dilation: g[n] = maxk { f[k] + h[k-n] }


The dilation with a binary structuring element (kernel, what the question refers to as a “conventional dilation”) uses a structuring element (kernel) that contains only 1s and 0s. These indicate “included” and “excluded”. That is, the 1s determine the domain of the structuring element.

To recreate the same behavior with a grey-value dilation, set the “included” pixels to 0 and the “excluded” pixels to minus infinity.

For example, the 3x3 square structuring element used in the question should be a 3x3 matrix of zeros.

Cris Luengo
  • 55,762
  • 10
  • 62
  • 120
  • Thanks, can you maybe edit to answer the question "how it should be used [...]"? I think it depends on the kernel then but I wouldn't want to mess up with your answer ;-) – Jav Mar 18 '19 at 09:15
  • @Jav: I forgot that part of the question. Added. – Cris Luengo Mar 18 '19 at 12:52
  • @CrisLuengo, in addition to "To recreate the same behavior with a grey-value dilation, set the “included” pixels to 0 and the “excluded” pixels to minus infinity.", don't you also have to "max" the result with 0, to prevent the result containing -infinity's? – Mark Lavin Feb 14 '21 at 16:34
  • @MarkLavin If the kernel has at least one value that is not -infinity, then the operation will always return a finite value. Of course assuming the input image contains only finite values. If the kernel is all -infinity, the operation would make no sense. – Cris Luengo Feb 14 '21 at 17:05
  • max(res,0) would be useful to avoid any negative values, not to avoid non-finite values. – Cris Luengo Feb 14 '21 at 17:07
  • This post has an implementation using this method: https://stackoverflow.com/questions/72733907/efficient-image-dilation-in-tensorflow – Peter Jun 23 '22 at 18:35
4

can do it like this:

def dilation2d(self, img4D):
    '''
    '''
    with tf.variable_scope('dilation2d'):
        kernel = tf.ones((3, 3, img4D.get_shape()[3])) 
        output4D = tf.nn.dilation2d(img4D, filter=kernel, strides=(1,1,1,1), rates=(1,1,1,1), padding="SAME")
        output4D = output4D - tf.ones_like(output4D)

        return output4D
CHEN
  • 41
  • 2
  • 3
    Sad that my answer wasn't understood. If you do `kernel = tf.zeros(...)` (instead of `ones`), then you don't need to subtract 1 after the dilation. – Cris Luengo Jan 10 '20 at 20:04