I'm curious about how I may be able to more reliably recognise the value and the suit of playing card images. Here are two examples:
There may be some noise in the images, but I have a large dataset of images that I could use for training (roughly 10k pngs, including all values & suits).
I can reliably recognise images that I've manually classified, if I have a known exact-match using a hashing method. But since I'm hashing images based on their content, then the slightest noise changes the hash and results in an image being treated as unknown. This is what I'm looking to reliably address with further automation.
I've been reviewing the 3.05 documentation on training tesseract: https://github.com/tesseract-ocr/tesseract/wiki/Training-Tesseract#automated-method
Can tesseract only be trained with images found in fonts? Or could I use it to recognise the suits for these cards?
I was hoping that I could say that all images in this folder correspond to 4c (e.g. the example images above), and that tesseract would see the similarity in any future instances of that image (regardless of noise) and also read that as 4c. Is this possible? Does anyone here have experience with this?