Questions tagged [ocr]

Optical Character Recognition, usually abbreviated to OCR, is the mechanical or electronic translation of scanned images of handwritten, typewritten or printed text into machine-encoded text. The following topics, although some being distinct fields of application, are also commonly referred to as OCR: Handwritten Text Recognition (HTR), Optical Word Recognition (OWR), Intelligent Character Recognition (ICR), Intelligent Word Recognition (IWR).

Optical character recognition, usually abbreviated to OCR, is the mechanical or electronic translation of scanned images of handwritten, typewritten or printed text into machine-encoded text. It is widely used to convert books and documents into electronic files, to computerize a record-keeping system in an office, or to publish the text on a website.

OCR @Wikipedia

Frequently-asked questions:

6124 questions
426
votes
3 answers

Simple Digit Recognition OCR in OpenCV-Python

I am trying to implement a "Digit Recognition OCR" in OpenCV-Python (cv2). It is just for learning purposes. I would like to learn both KNearest and SVM features in OpenCV. I have 100 samples (i.e. images) of each digit. I would like to train with…
Abid Rahman K
  • 51,886
  • 31
  • 146
  • 157
198
votes
14 answers

image processing to improve tesseract OCR accuracy

I've been using tesseract to convert documents into text. The quality of the documents ranges wildly, and I'm looking for tips on what sort of image processing might improve the results. I've noticed that text that is highly pixellated - for…
user364902
  • 3,146
  • 6
  • 23
  • 23
175
votes
14 answers

Has reCaptcha been cracked / hacked / OCR'd / defeated / broken?

Have any programming methods have been used to defeat reCAPTCHA? I'm interested in seeing evidence and potentially demonstrations that reCAPTCHA in particular has been made obsolete by completely automated, humanless methods. To clarify, not looking…
Dave Rutledge
  • 5,525
  • 7
  • 27
  • 24
166
votes
5 answers

Java OCR implementation

This is primarily just curiosity, but are there any OCR implementations in pure Java? I'm curious how this would perform purely in Java, and OCR in general interests me, so I'd love to see how it's implemented in a language I thoroughly understand.…
rat
  • 2,544
  • 5
  • 21
  • 19
149
votes
6 answers

Is there any free OCR library for Android?

I'm looking for a Java OCR that runs on Android, however Asprise doesn't seem to be a platform independent OCR. is there any opensource/free Java OCR I can use for android application development?
user121196
  • 30,032
  • 57
  • 148
  • 198
126
votes
20 answers

Tesseract running error

I have a problem with running tesseract-ocr engine on linux. I've downloaded RUS language data and put it to tessdata directory (/usr/local/share/tessdata). When I'm trying to run tesseract with command tesseract blob.jpg out -l rus , it displays an…
Russel Crowe
  • 1,271
  • 2
  • 8
  • 3
100
votes
4 answers

How do I choose between Tesseract and OpenCV?

I recently came across Tesseract and OpenCV. It looks like Tesseract is a full-fledged OCR engine and OpenCV can be used as a framework to create an OCR application/service. I tried using Tesseract on some of my images and its accuracy seems…
Legend
  • 113,822
  • 119
  • 272
  • 400
87
votes
7 answers

Limit characters tesseract is looking for

Is it possible to limit the set of characters that tesseract is looking for (e.g. search only for letters a-z)? That would improve my results greatly.
Danilo Bargen
  • 18,626
  • 15
  • 91
  • 127
75
votes
1 answer

How to get Indexing Service and MODI to produce Full-text over OCR?

I have configured Indexing Service to index my files, which also include scanned images saved as hi-res TIFF files. I also installed MS Office 2003+ and configured MS Office Document Imaging (MODI) correctly, so I can perform OCR on my images and…
Ishmaeel
  • 14,138
  • 9
  • 71
  • 83
72
votes
11 answers

How to recognize vehicle license / number plate (ANPR) from an image?

I have a web site that allows users to upload images of cars and I would like to put a privacy filter in place to detect registration plates on the vehicle and blur them. The blurring is not a problem but is there a library or component (open source…
71
votes
10 answers

How to make tesseract to recognize only numbers, when they are mixed with letters?

I want to use tesseract to recognize only numbers. The problem is that I have mixture of numbers & letters and when I use SetVariable("tessedit_char_whitelist", "0123456789") for every symbol tesseract returns wrong digit. Can I set a threshold…
zkunov
  • 3,362
  • 1
  • 20
  • 17
71
votes
1 answer

best OCR (Optical character recognition) example in android

I want a running example of OCR in android, I have done some research and find an example that implements OCR in android. https://github.com/rmtheis/tess-two and in it there are three projects files... eyes-two tess-two tess-two-test I have…
Komal
  • 739
  • 1
  • 6
  • 6
69
votes
4 answers

Pytesseract OCR multiple config options

I am having some problems with pytesseract. I need to configure Tesseract to that it is configured to accept single digits while also only being able to accept numbers as the number zero is often confused with an 'O'. Like this: target =…
Niall Oswald
  • 805
  • 1
  • 8
  • 7
61
votes
8 answers

Getting the bounding box of the recognized words using python-tesseract

I am using python-tesseract to extract words from an image. This is a python wrapper for tesseract which is an OCR code. I am using the following code for getting the words: import tesseract api =…
60
votes
5 answers

How to implement and do OCR in a C# project?

I ve been searching for a while and all that i ve seen some OCR library requests. I would like to know how to implement the purest, easy to install and use OCR library with detailed info for installation into a C# project. If posible, I just wanna…
Berker Yüceer
  • 7,026
  • 18
  • 68
  • 102
1
2 3
99 100