3

I have converted a Tensorflow model to Tensorflow JS and tried using in the browser. There are some preprocessing steps which are to be executed on the inout before feeding it to the model for inference. I have implemented these steps same as the Tensorflow. The problem is the inference results are not same on TF JS in comparison with Tensorflow. So I have started debugging the code and found that the results from the floating point arithmetic operations in the preprocessing on TF JS are different from the Tensorflow which is running on Docker container with GPU. Code used in the TF JS is below.

       var tensor3d = tf.tensor3d(image,[height,width,1],'float32')

        var pi= PI.toString();
        if(bs == 14 && pi.indexOf('1') != -1 ) {

          tensor3d =  tensor3d.sub(-9798.6993999999995).div(7104.607118190255)

        }
        else if(bs == 12 && pi.indexOf('1') != -1) {

          tensor3d = tensor3d.sub(-3384.9893000000002).div(1190.0708513300835)
        }
        else if(bs == 12 && pi.indexOf('2') != -1) {

          tensor3d =  tensor3d.sub(978.31200000000001).div(1092.2426342420442)

        }
        var resizedTensor = tensor3d.resizeNearestNeighbor([224,224]).toFloat()
        var copiedTens = tf.tile(resizedTensor,[1,1,3])
        return copiedTens.expandDims();

Python code blocks used

ds = pydicom.dcmread(input_filename, stop_before_pixels=True)
if (ds.BitsStored == 12) and '1' in ds.PhotometricInterpretation:
    normalize_mean = -3384.9893000000002
    normalize_std = 1190.0708513300835
elif (ds.BitsStored == 12) and '2' in ds.PhotometricInterpretation:
    normalize_mean = 978.31200000000001
    normalize_std = 1092.2426342420442
elif (ds.BitsStored == 14) and '1' in ds.PhotometricInterpretation:
    normalize_mean = -9798.6993999999995
    normalize_std = 7104.607118190255
else:
    error_response = "Unable to read required metadata, or metadata invalid. 
    BitsStored: {}. PhotometricInterpretation: {}".format(ds.BitsStored, 
    ds.PhotometricInterpretation)
    error_json = {'code': 500, 'message': error_response}
    self._set_headers(500)
    self.wfile.write(json.dumps(error_json).encode())
    return

    normalization = Normalization(mean=normalize_mean, std=normalize_std)
    resize = ResizeImage()
    copy_channels = CopyChannels()
    inference_data_collection.append_preprocessor([normalization, resize, 
    copy_channels])

Normalization code

    def normalize(self, normalize_numpy, mask_numpy=None):

        normalize_numpy = normalize_numpy.astype(float)

        if mask_numpy is not None:
            mask = mask_numpy > 0
        elif self.mask_zeros:
            mask = np.nonzero(normalize_numpy)
        else:
            mask = None

        if mask is None:
            normalize_numpy = (normalize_numpy - self.mean) / self.std
        else:
            raise NotImplementedError

        return normalize_numpy

ResizeImage code

   from skimage.transform import resize

   def Resize(self, data_group):

        input_data = data_group.preprocessed_case

        output_data = resize(input_data, self.output_dim)

        data_group.preprocessed_case = output_data
        self.output_data = output_data

CopyChannels code

    def CopyChannels(self, data_group):

        input_data = data_group.preprocessed_case

        if self.new_channel_dim:
            output_data = np.stack([input_data] * self.channel_multiplier, -1)
        else:
            output_data = np.tile(input_data, (1, 1, self.channel_multiplier))

        data_group.preprocessed_case = output_data
        self.output_data = output_data

Sample outoputs Left is Tensorflow on Docker with GPU and right is TF JS: enter image description here

The results are actually different after every step.

  • 1
    what are the operations you're using in python and what results are you comparing ? – edkeveked Jun 19 '19 at 07:53
  • the same operations specified in the above code block are being done on python using numpy and other python libraries. So I am trying to compare the results after each step for example after the operation tensor3d = tensor3d.sub(-9798.6993999999995).div(7104.607118190255) – Sai Raghuram Kaligotla Jun 19 '19 at 13:54

1 Answers1

1

There might be a number of possibilities that can lead to the issue.

1- The ops used in python are not used in the same manner in both js and python. If that is the case, using exactly the same ops will get rid of the issue.

2- The tensors image might be read differently by the python library and the browser canvas. Actually, accross browsers the canvas pixel don't always have the same value due to some operations like anti-aliasing, etc ... as explained in this answer. So there might be some slight differences in the result of the operations. To make sure that this is the root cause of the issue, first try to print the python and the js array image and see if they are alike. It is likely that the 3d tensor is different in js and python.

tensor3d = tf.tensor3d(image,[height,width,1],'float32')

In this case, instead of reading directly the image in the browser, one can use the python library to convert image to array of tensor. And use tfjs to read directly this array instead of the image. That way, the input tensors will be the same both for in js and in python.

3 - it is a float32 precision issue. tensor3d is created with the dtype float32 and depending on the operations used, there might be a precision issue. Consider this operation:

tf.scalar(12045, 'int32').mul(tf.scalar(12045, 'int32')).print(); // 145082032 instead of 145082025

The same precision issue will be encountered in python with the following:

a = tf.constant([12045], dtype='float32') * tf.constant([12045], dtype='float32')
tf.print(a) // 145082032

In python this can be solved by using int32 dtype. However because of the webgl float32 limitation the same thing can't be done using the webgl backend on tfjs. In neural networks, this precision issue is not a great deal. To get rid of it, one can change the backend using setBackend('cpu') for instance which is much slower.

edkeveked
  • 17,989
  • 10
  • 55
  • 93
  • Thank you for your reply, Actually the screenshot above is the comparison of the tensors on Docker and TF JS (on browser). The values are not the same , some at the precision level and some are totally different. The questions what I have are 1) what is the cause for this variation? 2) Is it possible to the get the tensor values same as on Docker by making any changes to the code ? 3) And is this difference is because of their architectures (GPU vs CPU)? – Sai Raghuram Kaligotla Jun 19 '19 at 14:51
  • What are you calling tensorflow on docker ? Is it still the tfjs running on docker or is it a python code ? – edkeveked Jun 19 '19 at 14:55
  • It is a python code , I am trying to compare the output from Tensorflow on python which is running in a Docker container and Tensorflow JS and FYI I am not using canvas for the image , As my input is actually a DICOM image I am using parser in JS to get the pixel data and carrying out the next process – Sai Raghuram Kaligotla Jun 19 '19 at 14:58
  • The answer explains what the issue might possibly be. Even if you're not using canvas for your image, tfjs uses canvas under the hood. Unless you try to compare the array `image` in both python and js, there is nothing more I can do to help – edkeveked Jun 19 '19 at 15:03
  • Yes I have compared the **image array** they were actually same. The results are diverging after I perform the computations on them. – Sai Raghuram Kaligotla Jun 19 '19 at 15:06
  • Can you add the python code as well to your question ? – edkeveked Jun 19 '19 at 15:11
  • 1
    I have added the python code to my question, please check. – Sai Raghuram Kaligotla Jun 19 '19 at 15:36
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/195286/discussion-between-sai-raghuram-kaligotla-and-edkeveked). – Sai Raghuram Kaligotla Jun 20 '19 at 14:33