I am attempting to create an app that can read a British-style crossword.
The first step is to be able to recognise a crossword image and parse it into an appropriately sized array.
I have been attempting to use Microsoft Azure Custom Object Detection software whereby I upload images of crosswords, highlight the main grid, clues across and clues down and then train the model.
Unfortunately, this doesn't appear to be working as the results I am getting after 100+ images indicates that identifying clues in such a way on a crossword grid is very difficult. Although the model states that the recall is at 48%, in reality it is closer to 5% when you insist upon a probability threshold of 50%. I am aware that 100+ images really isn't that many, but the results make me think I would need an image base in the thousands to achieve what I want. This would take a prohibitive amount of time and effort to implement.
So, is there a model out there that exists that can be modified to detect and parse crosswords? I have searched for one and haven't found anything suitable.
I have found resources such as OpenCV but it isn't obvious how, or indeed if, they can be modified to do what I want.
The model itself would need to be usable in TensorflowJS (my understanding is that there is a way to convert python models to TFJS).
Any help or guidance would be appreciated.