1

I am working on an image segmentation project. While in processing the ground truth masks (labels) which are in "PNG" format, I encounter a strange problem.

here are some code clips (iPython) and their output to illustrate my problem:

import numpy as np
import cv2
from PIL import Image

img_p = Image.open("test.png")
print(img_p.mode)
print(np.unique(np.array(img_p)))
img_cv = cv2.imread("test.png", cv2.IMREAD_UNCHANGED)
np.unique(img_cv)
array([  0, 255], dtype=uint8)
img_p = Image.open("test.png").convert("L")
print(np.unique(np.array(img_p)))

Why will the OpenCV convert the label map which should be [0, 1] (background, foreground) to [0, 255]?

Zoro
  • 420
  • 5
  • 16
Beanocean
  • 133
  • 10

2 Answers2

0

Check this: https://pillow.readthedocs.io/en/stable/handbook/concepts.html#concept-modes in your code:

  • opencv read image in mode "BGR" ==> value between [0, 255]
  • PILLOW read your image in mode "P" , I'll let you see the link to understand a little bit, if you get the pixel values between [0, 255], you have to convert your Image to "RGBA", "RGB" or "L" mode.

Try this:

import numpy as np
from PIL import Image

img_p = Image.open("img.png")
print("mode =",img_p.mode)              #RGBA
print(np.unique(np.array(img_p)))

img_rgb = img_p.convert(mode = "RGB")
print("mode =",img_rgb.mode)            #RGB
print(np.unique(np.array(img_rgb)))

img_rgb = img_p.convert(mode = "P")
print("mode =",img_rgb.mode)            #P
print(np.unique(np.array(img_rgb)))

img_rgb = img_p.convert(mode = "L")
print("mode =",img_rgb.mode)            #L
print(np.unique(np.array(img_rgb)))

if you have 3 colors, the "P" mode returns you [0, 1, 2] .... if you have 4 colors, the "P" mode returns you [0, 1, 2, 3] ....

Palette "P" mode works by creating a mapping table, which corresponds to an index (between 0 and 255) to a discrete color in a larger color space (like RGB). For example, the RGB color value (0, 0, 255) (Pure Blue) in an image gets an index of 1 (just a hypothetical example). This same process goes through every single pixel value in the original image (but the table size should not exceed 256, in the mapping process).

therefore each color corresponds to an index in the mapping table, rather than the actual color value itself. So you can interpret them as an index which, when playing an image, is converted to the actual color value stored in that index. To answer your question, when you switch from "P" mode to "L" mode you just retrieve the real values stored in this index

RashidLadj_Winux
  • 810
  • 1
  • 6
  • 17
  • Please see Line 106 and 111, why the value range will be [0, 1] if the image mode is in 'P', while if convert to 'L' mode, the value range is [0, 255] – Beanocean Oct 04 '20 at 15:45
  • yes it's true, because "P" mode (Check the site) mapped to any other mode using a color palette (max 8-bit pixels), as you have two colors either black or white, then mode "P" maps mode "1" so your values will be either 0 (black) or 1 (white) 8-bit pixels, mapped to any other mode using a color palette) – RashidLadj_Winux Oct 04 '20 at 17:23
  • so what's the mechanism behind the convert from P mode to L mode. will the target value in L mode be affected by the palette I used in P mode. e.g.: I have set the palette as {0: (0, 0, 0), 1:(255, 255, 255)} in P mode. – Beanocean Oct 05 '20 at 08:45
  • Palette mode works by creating a mapping table, which corresponds to an index (between 0 and 255) to a discrete color in a larger color space (like RGB). For example, the RGB color value (0, 0, 255) (Pure Blue) in an image gets an index of 1 (just a hypothetical example). This same process goes through every single pixel value in the original image (but the table size should not exceed 256, in the mapping process). – RashidLadj_Winux Oct 05 '20 at 21:39
  • therefore each color corresponds to an index in the mapping table, rather than the actual color value itself. So you can interpret them as an index which, when playing an image, is converted to the actual color value stored in that index. To answer your question, when you switch from "P" mode to "L" mode you just retrieve the real values stored in this index – RashidLadj_Winux Oct 05 '20 at 21:39
-1

From the basics I've learnt about images, I can tell you that each pixel of an image takes 1 byte. The pixel intensity starts from 0 = b00000000 to 255 = b11111111 where in a typical grayscaled image 0 is black and 255 is white. However, often it is convenient to normalize the pixel values into 0 to 1. So, different libraries use different convention and I think it actually depends on the preference of the guy(s) who wrote the library. Either is fine.

Afif Al Mamun
  • 199
  • 1
  • 11
  • Pixels don't automatically take one byte, they can be as small as a single bit. Such images aren't as common as they used to be, but they exist – Mark Ransom Oct 04 '20 at 16:15
  • Thank you for the info. I was just talking about 'in general', I should have mentioned that though. Anyway, thanks again. – Afif Al Mamun Oct 04 '20 at 17:17