12

I want to do some image OCR with PyTesseract, and I've seen that OpenCV's erode and dilate functions are very useful for noise removal pre-processing.

Since PyTesseract already requires PIL/Pillow, I'd like to do the noise removal in PIL, rather than get another library. Is there an equivalent to erode/dilate in PIL? (My research seems to suggest that MaxFilter and MinFilter could be used this way, but it's not fully clear to me if that's really true.)

Thanks!

ROldford
  • 310
  • 1
  • 3
  • 12

1 Answers1

16

The best option is to use OpenCV python bindings. However, if you want to use PIL/Pillow, there is the ImageFilter Module: http://pillow.readthedocs.io/en/3.1.x/reference/ImageFilter.html

dilation_img = src_img.filter(ImageFilter.MaxFilter(3))
erosion_img = src_img.filter(ImageFilter.MinFilter(3))

The number 3 in the example is the mask size;

Oliver Zendel
  • 2,695
  • 34
  • 29
  • 2
    Why do you think that OpenCV is the 'best option'? Is it faster than Pillow for this operation? – maxschlepzig Apr 29 '18 at 07:13
  • 2
    OpenCV is better suited for noise removal or more advanced operations as it's focus is computer vision. Regarding speed: depends on the versions and the OS but in general just about all operations are faster in OpenCV than in Pillow (there are often dedicated GPU versions in OpenCV as well): https://www.kaggle.com/vfdev5/pil-vs-opencv – Oliver Zendel May 02 '18 at 07:51