0

I am performing an image segmentation tasks. I am converting images to labels but the issue with images is due to artefact of jpg compression, there are intermediate colors present in images.So for an image which is supposed to have 4 colors(for my case), they are having many colors.for instance, the below image has 338 colors present in it- Image

which I checked using following code-

image = Image.open("Image_Path")
image = np.array(image)
target = torch.from_numpy(image)
h,w = target.shape[0],target.shape[1]
masks = torch.empty(h, w, dtype=torch.long)
colors = torch.unique(target.view(-1,target.size(2)),dim=0).numpy()

To resolve this problem, I tried this approach but the problem is it is converting the image in the non pre-determined pixel values.It converts the above image into the following pixel values-

array([[  0,   0,   0],
   [  0,   0, 254],
   [  0, 254,   0],
   [254,   0,   0]]

which is kinda problematic for me because I have different images and they all need to have same pixel values for each colors for every image but using the above method, it is not same for other images, it may convert red colored image to [255,0,0] or similarly other colors too. How to do it?

Cris Luengo
  • 55,762
  • 10
  • 62
  • 120
Beginner
  • 721
  • 11
  • 27
  • 1
    Don't use JPEG to store images like that. Use an indexed format, your image can essentially be saved using only 2 bits per pixel. But even as a standard full-color PNG file it will be saved very efficiently and without artifacts. – Cris Luengo Sep 19 '19 at 15:07
  • The thing is the images are already stored, I received this dataset and I have to perform operations on it. And the thing is I am observing artifact effect in png images also so I am kinda struck here with them – Beginner Sep 19 '19 at 15:10
  • If you observe artifacts in PNG images it means someone took a JPEG image and converted it. Whoever sent you the images should be told to send you correct data, not data with artificial artifacts that are easy to avoid. Of course you can get around it, but by doing so you're wasting your time because someone else doesn't know what they're doing and nobody dared explain to them how to do their job. – Cris Luengo Sep 19 '19 at 15:17
  • @CrisLuengo I completely agree.This is kind of academia dataset and academicians are generally bothered about publishing only without thinking about long term aims. – Beginner Sep 19 '19 at 15:23

1 Answers1

2

Starting with the approach you already tried (copying code from that answer here for reference, modified to read OP's image):

import numpy as np
from skimage import io
from sklearn.cluster import KMeans

original = io.imread('https://i.stack.imgur.com/9XMfw.png')
original = original[:,:,0:3]  # remove alpha channel
n_colors = 4

arr = original.reshape((-1, 3))
kmeans = KMeans(n_clusters=n_colors, random_state=42).fit(arr)
labels = kmeans.labels_
centers = kmeans.cluster_centers_
less_colors = centers[labels].reshape(original.shape).astype('uint8')

io.imshow(less_colors)

The goal is to get centers to be consistent from image to image. We can do this by forcing each element in this array to be either 0 or 255. This choice will only work if the input images all have similar colors that are distinguishable after this change. Assuming they all have the same 4 colors used in the one image posted in the question, this should not be a concern.

One simple way of accomplishing this is to divide the values by 255, round them, then multiply by 255 again:

centers = np.round(centers/255)*255

This line of code would come before creating less_colors.

Cris Luengo
  • 55,762
  • 10
  • 62
  • 120
  • The number of colors kinda shot up after I used this.Now, I have 83392 colors. – Beginner Sep 19 '19 at 15:56
  • 1
    Did you copy-paste the code as I wrote it? The `np.round` statement goes after `centers = kmenas.cluster_centers_`. I get exactly 4 colors, and I don't see how it's possible to get any other number of them. – Cris Luengo Sep 19 '19 at 16:06
  • sorry, I am an idiot, I was testing code on the old file.Thanks a lot. – Beginner Sep 19 '19 at 16:07