Most simple approach for digit recognition in Python

Question

I have a simple digit recognition project and have noticed that people generally use two approaches when doing so in Python. My goal is to input a PDF document and get the HANDWRITTEN digits in particular places of the page.

I saw that people either use opencv, as in this question, or scikitlearn, as is seen in this example. I am not familiar with either, and am wondering which one would be most simple to learn and implement, given my intended usage. Thanks.

What do you mean by "get the digits"? Generally, you could use any pdf reading tool (pdfminer, etc..), open it up and use regular expressions to find your digits, if that's what you're referring to. I assume, considering that you mentioned scikit, that you didn't intend for that. — nir0s, Mar 09 '17 at 18:32
The scikit-learn example is not solving the same problem! (Classifying a preprocessed and cropped digit != finding a digit). — sascha, Mar 09 '17 at 18:32
I always recommend scikit-learn, it is much more robust and has many functionalities to help you deal with your large dataset. To get the digits, crop them based on their pixel position, and feed them to your machine learning algorithm. What are you planing on using? — JahKnows, Mar 09 '17 at 18:33
sklearn has no object-detector. So It's not ready for OCR. OP should define his task. What are ```particular places```? — sascha, Mar 09 '17 at 18:34

score 1 · Accepted Answer · answered Mar 10 '17 at 00:30

1

I suggest that you should use both opencv and scikitlearn. After you turn your pdf into an image, you can use opencv for image pre-processing (Gaussian Blur, thresholding, Erosion/Dilation Filters), so that the digits will become more easy to extract. Then you can use contour tracing (again opencv) to detect the individual digits. After you have extracted your digits (and given that you have a training set), you can use scikitlearn for the classification.

answered Mar 10 '17 at 00:30

GStav

1,066
12
20

Thanks, that's useful. I do not have a training set. Is there some place where I can find a generic training set of digits? – splinter Mar 10 '17 at 10:14
As far as I know, the most famous training set of handwritten digits is [MNIST](http://yann.lecun.com/exdb/mnist/) . – GStav Mar 10 '17 at 14:34

Most simple approach for digit recognition in Python

1 Answers1