Python - Remove Black Pixels originatin from the border of an image

Question

I am very new to Image processing and I am trying to cleanse pictures similar to picture 1 of the Black Pixels originating from the border of the Image.

The Images are clipped Characters from a PDF which I try to process with tesseract to retieve the character. I already searched in Stackoverflow for answers, but only found resolutions to get rid of black borders. I need to overwrite all the black pixels from the corners with white pixels, so tesseract can correctly recognize the character.

I cannot alter the Bounding Boxes used to clip the Characters, since the characters are centered in different ares of the BoundingBox and if i Cut the BoundingBox, i would cut some Characters like seen below

My first guess would have been to recursively track down pixels with a certain threshhold of black in them, but I am scared of computing time in that case and wouldn't really know where and how to start, except for using two two-dimensional arrays, one with the pixels, and one with an indicator whether i already worked on that pixel or not.

Help would be greatly appreciated.

Edit: some more pictures of cases, where black pixels from the edge need to be cleared:

Edit: Code-Snippet to create Border Image:

    @staticmethod
    def __get_border_image(image: Image) -> Image:
        data = numpy.asarray(image)

        border = cv2.copyMakeBorder(data, top=5, bottom=5, left=5, right=5, borderType=cv2.BORDER_CONSTANT)

        return Image.fromarray(border)

Can you give some more examples of images and what is desired. 1st image is not clear. — Abhi25t, Jan 25 '21 at 13:26

Mark Setchell · Accepted Answer · 2021-01-25T13:36:46.893

1

Try like this:

artificially add a 1px wide black border all around the edge
flood-fill with white all black pixels starting at top-left corner
remove the 1px border from the first step (if necessary)

The point of adding the border is to allow the white to "flow" all around all edges of the image and reach any black items touching the edge.

edited Jan 25 '21 at 13:36

answered Jan 25 '21 at 13:26

Mark Setchell

191,897
31
273
432

Is there a way to open an opencv image from bytes or PIL.Image in Python, since i do not have the images stored in Files? – josuaschenk Jan 25 '21 at 13:38
Bear in mind that OpenCV stores images as Numpy arrays. So, I think you mean `frame = np.frombuffer(data, dtype=np.uint8).reshape((height, width, 3))` or the `3` becomes a `1` if greyscale. – Mark Setchell Jan 25 '21 at 13:41
I have never worked with OpenCv before. I saw that based on the Documentation, but i cant afford to store all the images as files, i will just have them as PIL.Image, and if i need to store all of the clipped Images as files than my performance is really screwed and i cant afford that – josuaschenk Jan 25 '21 at 13:46
Not sure what the issue is - if you showed your code like StackOverflow requires, it would be easier to help you. You can use `ImageOps.expand()` in `PIL` see here https://stackoverflow.com/a/60392932/2836621 And you can use `ImageDraw.floodfill()` in `PIL` see https://stackoverflow.com/a/65683106/2836621 – Mark Setchell Jan 25 '21 at 14:05
I got creating the border working now. Code Snippet for that is edited into the post. I had no Code previously to show since i just had the PIL.Image Class and nothing else. – josuaschenk Jan 25 '21 at 14:15
Yes, i got it working. I am playing around with settings and the BoundingBoxes for optimal results, since some characters are very close to the edge, so they get cleared by flood fill even if the border is just one pixel. I plan to extend the Clipped Image and then use a border with the same size of the extension, but thats just details – josuaschenk Jan 25 '21 at 14:54

Python - Remove Black Pixels originatin from the border of an image

1 Answers1