Questions tagged [ocr]

Optical Character Recognition, usually abbreviated to OCR, is the mechanical or electronic translation of scanned images of handwritten, typewritten or printed text into machine-encoded text. The following topics, although some being distinct fields of application, are also commonly referred to as OCR: Handwritten Text Recognition (HTR), Optical Word Recognition (OWR), Intelligent Character Recognition (ICR), Intelligent Word Recognition (IWR).

Optical character recognition, usually abbreviated to OCR, is the mechanical or electronic translation of scanned images of handwritten, typewritten or printed text into machine-encoded text. It is widely used to convert books and documents into electronic files, to computerize a record-keeping system in an office, or to publish the text on a website.

OCR @Wikipedia

Frequently-asked questions:

Simple Digit Recognition OCR in OpenCV-Python

6124 questions

426

votes

3 answers

Simple Digit Recognition OCR in OpenCV-Python

I am trying to implement a "Digit Recognition OCR" in OpenCV-Python (cv2). It is just for learning purposes. I would like to learn both KNearest and SVM features in OpenCV. I have 100 samples (i.e. images) of each digit. I would like to train with…

asked Feb 23 '12 at 12:37

Abid Rahman K

51,886
31
146
157

198

votes

14 answers

image processing to improve tesseract OCR accuracy

I've been using tesseract to convert documents into text. The quality of the documents ranges wildly, and I'm looking for tips on what sort of image processing might improve the results. I've noticed that text that is highly pixellated - for…

image-processing ocr tesseract

asked Feb 28 '12 at 10:12

user364902

3,146
6
23
23

175

votes

14 answers

Has reCaptcha been cracked / hacked / OCR'd / defeated / broken?

Have any programming methods have been used to defeat reCAPTCHA? I'm interested in seeing evidence and potentially demonstrations that reCAPTCHA in particular has been made obsolete by completely automated, humanless methods. To clarify, not looking…

security captcha ocr recaptcha

asked Jan 15 '09 at 23:32

Dave Rutledge

5,525
7
27
24

166

votes

5 answers

Java OCR implementation

This is primarily just curiosity, but are there any OCR implementations in pure Java? I'm curious how this would perform purely in Java, and OCR in general interests me, so I'd love to see how it's implemented in a language I thoroughly understand.…

java ocr

asked Nov 28 '09 at 21:55

rat

2,544
5
21
19

149

votes

6 answers

Is there any free OCR library for Android?

I'm looking for a Java OCR that runs on Android, however Asprise doesn't seem to be a platform independent OCR. is there any opensource/free Java OCR I can use for android application development?

android ocr

asked Jul 09 '09 at 20:13

user121196

30,032
57
148
198

126

votes

20 answers

Tesseract running error

I have a problem with running tesseract-ocr engine on linux. I've downloaded RUS language data and put it to tessdata directory (/usr/local/share/tessdata). When I'm trying to run tesseract with command tesseract blob.jpg out -l rus , it displays an…

ocr tesseract

asked Feb 10 '13 at 17:53

Russel Crowe

1,271
2
8
3

100

votes

4 answers

How do I choose between Tesseract and OpenCV?

I recently came across Tesseract and OpenCV. It looks like Tesseract is a full-fledged OCR engine and OpenCV can be used as a framework to create an OCR application/service. I tried using Tesseract on some of my images and its accuracy seems…

python opencv computer-vision ocr tesseract

asked Jul 15 '12 at 06:07

Legend

113,822
119
272
400

votes

7 answers

Limit characters tesseract is looking for

Is it possible to limit the set of characters that tesseract is looking for (e.g. search only for letters a-z)? That would improve my results greatly.

ocr tesseract

asked Mar 02 '10 at 13:47

Danilo Bargen

18,626
15
91
127

votes

1 answer

How to get Indexing Service and MODI to produce Full-text over OCR?

I have configured Indexing Service to index my files, which also include scanned images saved as hi-res TIFF files. I also installed MS Office 2003+ and configured MS Office Document Imaging (MODI) correctly, so I can perform OCR on my images and…

ocr modi indexing-service

asked Aug 05 '08 at 23:16

Ishmaeel

14,138
9
71
83

votes

11 answers

How to recognize vehicle license / number plate (ANPR) from an image?

I have a web site that allows users to upload images of cars and I would like to put a privacy filter in place to detect registration plates on the vehicle and blur them. The blurring is not a problem but is there a library or component (open source…

image ocr computer-vision automatic-license-plate-recognition

asked Jun 11 '09 at 14:18

Ryan O'Neill

5,410
4
46
69

votes

10 answers

How to make tesseract to recognize only numbers, when they are mixed with letters?

I want to use tesseract to recognize only numbers. The problem is that I have mixture of numbers & letters and when I use SetVariable("tessedit_char_whitelist", "0123456789") for every symbol tesseract returns wrong digit. Can I set a threshold…

ocr tesseract

asked Feb 09 '11 at 12:29

zkunov

3,362
1
20
17

votes

1 answer

best OCR (Optical character recognition) example in android

I want a running example of OCR in android, I have done some research and find an example that implements OCR in android. https://github.com/rmtheis/tess-two and in it there are three projects files... eyes-two tess-two tess-two-test I have…

android ocr tesseract

asked Oct 23 '13 at 05:12

Komal

votes

4 answers

Pytesseract OCR multiple config options

I am having some problems with pytesseract. I need to configure Tesseract to that it is configured to accept single digits while also only being able to accept numbers as the number zero is often confused with an 'O'. Like this: target =…

python ocr tesseract

asked Jun 18 '17 at 20:07

Niall Oswald

votes

8 answers

Getting the bounding box of the recognized words using python-tesseract

I am using python-tesseract to extract words from an image. This is a python wrapper for tesseract which is an OCR code. I am using the following code for getting the words: import tesseract api =…

python image-processing ocr tesseract python-tesseract

asked Dec 30 '13 at 00:15

Abtin Rasoulian

votes

5 answers

How to implement and do OCR in a C# project?

I ve been searching for a while and all that i ve seen some OCR library requests. I would like to know how to implement the purest, easy to install and use OCR library with detailed info for installation into a C# project. If posible, I just wanna…

c# ocr

asked Jun 08 '12 at 10:46

Berker Yüceer

7,026
18
68
102

2 3

…

99 100 Next