2

I use cv.imread to read a png file in python. When I then use cv.imwrite function to immediately save the image i then find that the colours in the image have changed slightly. I am trying to perform character recognition on this image and the OCR performs far less well on the image in python than the original image. The first image is the original, and the second is the saved one with OpenCV.

First

second

We can see that the green has changed slightly and whilst this does not seem important it affects the OCR and I therefore imagine that other changes are happening to the png. Does anyone know why this might be and how i can resolve this.

The code is as follows

img = cv2.imread('file.png')
cv2.imwrite('out.png', img)

When I run file.png in tesseract for character recognition I get great results but when I run out.png in tesseract far less words get recognised correctly.

eyllanesc
  • 235,170
  • 19
  • 170
  • 241
Arthur Le Calvez
  • 413
  • 2
  • 7
  • 17
  • Are you saving the image as PNG or JPEG? Also take a look at [this post](https://stackoverflow.com/q/33142786/2286337). – zindarod Jul 09 '18 at 11:25
  • Im loading and saving it as a png – Arthur Le Calvez Jul 09 '18 at 11:40
  • Added code to question – Arthur Le Calvez Jul 09 '18 at 12:25
  • OpenCV uses BGR format instead of RGB, that's why you observe that diference in color – m33n Jul 09 '18 at 12:41
  • @m33n that is only for internal use, when saving it doesn't matter – chris Jul 09 '18 at 12:42
  • 1
    Possible duplicate of [OpenCV imrite gives washed-out result for jpeg images](https://stackoverflow.com/questions/33142786/opencv-imrite-gives-washed-out-result-for-jpeg-images) – chris Jul 09 '18 at 12:48
  • Could be something to do with png color profiles – Eric Jul 09 '18 at 12:58
  • On photoshop when I change the colour profile of the second image to Adobe 1998 RGB the colour of the image changes to the lighter green seen in the first image. However once i save this and run it through OCR I still dont get good results. This shows that there are more underlying changes occurring – Arthur Le Calvez Jul 09 '18 at 13:01

1 Answers1

4

When you have a .png image file you ought to read as a .png file.

I downloaded your image and did some analysis myself.

  • First, I read the image as you did:

    img = cv2.imread('file.png')
    

    img.shape returns (446, 864, 3) i.e an image with 3 channels.

  • Next I read the same image using cv2.IMREAD_UNCHANGED:

    img = cv2.imread('file.png', cv2.IMREAD_UNCHANGED)
    

    img.shape returns (446, 864, 4) i.e an image with 4 channels.

.png files have an additional transparency channel. So next you come accross a .png file read it using cv2.IMREAD_UNCHANGED flag

UPDATE:

Enlisting the various ways to read an image:

for var in dir(cv2):
    if var.startswith('IMREAD'):
        print(var)

returns:

IMREAD_ANYCOLOR
IMREAD_ANYDEPTH
IMREAD_COLOR
IMREAD_GRAYSCALE
IMREAD_LOAD_GDAL
IMREAD_UNCHANGED
Jeru Luke
  • 20,118
  • 13
  • 80
  • 87