So I often run huge double-sided scan jobs on an unintelligent Canon multifunction, which leaves me with a huge folder of JPEGs. Am I insane to consider using PIL to analyze a folder of images to detect scans of blank pages and flag them for deletion?
Leaving the folder-crawling and flagging parts out, I imagine this would look something like:
- Check if the image is greyscale, as this is presumed uncertain.
- If so, detect the dominant range of shades (background colour).
- If not, detect the dominant range of shades, restricting to light greys.
- Determine what percentage of the entire image is composed of said shades.
- Try to find a threshold that adequately detects pages with type or writing or imagery.
- Perhaps test fragments of the image at a time to increase accuracy of threshold.
I know this is sort of an edge case, but can anyone with PIL experience lend some pointers?