0

I am working with an image of size 512x512. The image is divided into patches using einops with patch size of 32. The number of patches overall is 256, in other words, we get a new "image" of size 256x1024.

Since this image is actually a mask for a segmentation problem, the image is actually comprised of only 4 values (4 classes): 0 for the background, 1 for the first class, 2 for the second class, 3 for the third class.

My goal is to take every patch, and compute for every class C the following:

Number of pixels in this patch / Number of pixels labeled C.

This should give me an array of size 4 where the first entry is the total number of pixels in the patch (1024) over the number of background pixels (labeled as 0), the second, third and fourth entries are the same but for the corresponding class.

In theory, I know that I need to iterate over every single patch and then count how many pixels of each class exists in the current patch, then divide by 1024. Doing this 256 yields exactly what I want. The problem is that I have a (very) large amount of images that I need to do this for, and the size of 512 is just an example to make the question simpler, therefore a for loop is out of question.

I know that I can get the result that I want using numpy. I tried both: numpy.apply_over_axes and numpy.apply_along_axis but I don't know which one is better suited for this task, also there is numpy.where which I don't know how it applies here.

Here is what I did:

from einops import rearrange
import numpy as np

labn = np.random.randint(4,size= (512,512))      # Every pixels in this image is of value: 0,1,2,3
to_patch = rearrange(labn, "(h p1) (w p2) -> (h w) (p1 p2)", p1=32, p2=32)
print(to_patch.shape) # (256,1024)

c0 = np.full(1024, 0)
c1 = np.full(1024, 1)
c2 = np.full(1024, 2)
c3 = np.full(1024, 3)

def f(a):
    _c0 = a == c0
    _c1 = a == c2
    _c2 = a == c2
    _c3 = a == c3
    pr = np.array([np.sum(_c0), np.sum(_c1), np.sum(_c2), np.sum(_c3)]) / 1024
    return pr

resf = np.apply_along_axis(f, 1, to_patch)
print(resf.shape) # (256, 4, 1024)

Two things:

  1. I want the output to be 256x4 where every array along the second axes sums to one.
  2. Is there a faster/better/pythonic way to do this, preferably vectorized?

EDIT: I forgot to add the sum, so now I do get 256x4.

user574362
  • 67
  • 1
  • 7

1 Answers1

1

There is a built-in function to count occurrences called torch.histc, it is similar to Python's collections.Counter.

torch.histc(input, bins=100, min=0, max=0, *, out=None) → Tensor
Computes the histogram of a tensor.

The elements are sorted into equal width bins between min and max. If min and max are both zero, the minimum and maximum values of the data are used.

Elements lower than min and higher than max are ignored.

You need to specify the number of bins, here the number of classes C. As well as the min and max values for ordering. Also, it won't work with multi-dimensional tensors as such the resulting tensor will contain global statistics of the input tensor regardless of dimensions. As a possible workaround, you can iterate through your patches, calling torch.histc each time, then stacking the results and normalizing:

resf = torch.stack([torch.histc(patch, C, min=0, max=C-1) for patch in x]) / x.size(1)
Ivan
  • 34,531
  • 8
  • 55
  • 100
  • So I just need to iterate through each row using this method? I will see if that is what I seek once I am near my pc :) – user574362 Sep 08 '21 at 20:47
  • @user574362 The line above will return the desired result i.e. the patch-wise statistics for each of the `C` classes, *i.e.* a tensor shaped `256x4`. – Ivan Sep 08 '21 at 21:04
  • Hey Ivan, the line above returns the following error: `RuntimeError: _th_histc not supported on CPUType for Long` which is weird because the `dtype` of `to_patch` is `int64`. This step is done as preprocessing and not when I load the data, so it should be on the cpu not on the gpu, so I was wondering if it can be done using `numpy` (im ok with torch but it doesn't work) – user574362 Sep 09 '21 at 09:55
  • Actually, you need to convert to floating-point numbers: just cast your input array `x` with `x.float()`, and it should work. Also, it won't matter if you're on CPU or GPU. – Ivan Sep 09 '21 at 10:09
  • 1
    I noticed that this method doesn't give consistent indices, for example, in some patches, the background gets index 2 : `tensor([0., 0., 1., 0.])` but in others it gets index 0: `tensor([0.9912, 0.0000, 0.0000, 0.0088])`. Is there a way to make every index correspond to a class? – user574362 Sep 10 '21 at 10:47
  • Indeed, I have edited my answer below: you need to specify `min` and `max` in the call to `histc`. – Ivan Sep 11 '21 at 09:43