1

I am very new to Image processing and I am trying to cleanse pictures similar to picture 1 of the Black Pixels originating from the border of the Image.

Clipped Image of a Character using PyMuPDF

The Images are clipped Characters from a PDF which I try to process with tesseract to retieve the character. I already searched in Stackoverflow for answers, but only found resolutions to get rid of black borders. I need to overwrite all the black pixels from the corners with white pixels, so tesseract can correctly recognize the character.

I cannot alter the Bounding Boxes used to clip the Characters, since the characters are centered in different ares of the BoundingBox and if i Cut the BoundingBox, i would cut some Characters like seen below

Clipped Image of Character with BoundingBox adjusted to fit before seen Image

My first guess would have been to recursively track down pixels with a certain threshhold of black in them, but I am scared of computing time in that case and wouldn't really know where and how to start, except for using two two-dimensional arrays, one with the pixels, and one with an indicator whether i already worked on that pixel or not.

Help would be greatly appreciated.

Edit: some more pictures of cases, where black pixels from the edge need to be cleared:

enter image description hereenter image description hereenter image description hereenter image description hereenter image description here

Edit: Code-Snippet to create Border Image:

    @staticmethod
    def __get_border_image(image: Image) -> Image:
        data = numpy.asarray(image)

        border = cv2.copyMakeBorder(data, top=5, bottom=5, left=5, right=5, borderType=cv2.BORDER_CONSTANT)

        return Image.fromarray(border)

1 Answers1

1

Try like this:

  • artificially add a 1px wide black border all around the edge
  • flood-fill with white all black pixels starting at top-left corner
  • remove the 1px border from the first step (if necessary)

The point of adding the border is to allow the white to "flow" all around all edges of the image and reach any black items touching the edge.

Mark Setchell
  • 191,897
  • 31
  • 273
  • 432
  • Is there a way to open an opencv image from bytes or PIL.Image in Python, since i do not have the images stored in Files? – josuaschenk Jan 25 '21 at 13:38
  • Bear in mind that OpenCV stores images as Numpy arrays. So, I think you mean `frame = np.frombuffer(data, dtype=np.uint8).reshape((height, width, 3))` or the `3` becomes a `1` if greyscale. – Mark Setchell Jan 25 '21 at 13:41
  • I have never worked with OpenCv before. I saw that based on the Documentation, but i cant afford to store all the images as files, i will just have them as PIL.Image, and if i need to store all of the clipped Images as files than my performance is really screwed and i cant afford that – josuaschenk Jan 25 '21 at 13:46
  • Not sure what the issue is - if you showed your code like StackOverflow requires, it would be easier to help you. You can use `ImageOps.expand()` in `PIL` see here https://stackoverflow.com/a/60392932/2836621 And you can use `ImageDraw.floodfill()` in `PIL` see https://stackoverflow.com/a/65683106/2836621 – Mark Setchell Jan 25 '21 at 14:05
  • I got creating the border working now. Code Snippet for that is edited into the post. I had no Code previously to show since i just had the PIL.Image Class and nothing else. – josuaschenk Jan 25 '21 at 14:15
  • Yes, i got it working. I am playing around with settings and the BoundingBoxes for optimal results, since some characters are very close to the edge, so they get cleared by flood fill even if the border is just one pixel. I plan to extend the Clipped Image and then use a border with the same size of the extension, but thats just details – josuaschenk Jan 25 '21 at 14:54