iOS: Real Time OCR on top of live camera feed (similar to iTunes Redeem Gift Card)

Question

Is there a way to accomplish something similar to what the iTunes and App Store Apps do when you redeem a Gift Card using the device camera, recognizing a short string of characters in real time on top of the live camera feed?

iTunes App Redeem Gift Card UI

I know that in iOS 7 there is now the AVMetadataMachineReadableCodeObject class which, AFAIK, only represents barcodes. I'm more interested in detecting and reading the contents of a short string. Is this possible using publicly available API methods, or some other third party SDK that you might know of?

There is also a video of the process in action:

https://www.youtube.com/watch?v=c7swRRLlYEo

Best,

Hi @boliva, after 3 years from your question, have u reached to any library we can depend on for live OCR with IOS — palAlaa, Jul 26 '17 at 05:18

score 16 · Answer 1 · answered Nov 26 '14 at 03:23

I'm working on a project that does something similar to the Apple app store redeem with camera as you mentioned.

A great starting place on processing live video is a project I found on GitHub. This is using the AVFoundation framework and you implement the AVCaptureVideoDataOutputSampleBufferDelegate methods.

Once you have the image stream (video), you can use OpenCV to process the video. You need to determine the area in the image you want to OCR before you run it through Tesseract. You have to play with the filtering, but the broad steps you take with OpenCV are:

Convert the images to B&W using cv::cvtColor(inputMat, outputMat, CV_RGBA2GRAY);
Threshold the images to eliminate unnecessary elements. You specify the threshold value to eliminate, and then set everything else to black (or white).
Determine the lines that form the boundary of the box (or whatever you are processing). You can either create a "bounding box" if you have eliminated everything but the desired area, or use the HoughLines algorithm (or the probabilistic version, HoughLinesP). Using this, you can determine line intersection to find corners, and use the corners to warp the desired area to straighten it into a proper rectangle (if this step is necessary in your application) prior to OCR.
Process the portion of the image with Tesseract OCR library to get the resulting text. It is possible to create training files for letters in OpenCV so you can read the text without Tesseract. This could be faster but also could be a lot more work. In the App Store case, they are doing something similar to display the text that was read overlaid on top of the original image. This adds to the cool factor, so it just depends on what you need.

Some other hints:

I used the book "Instant OpenCV" to get started quickly with this. It was pretty helpful.
Download OpenCV for iOS from OpenCV.org/downloads.html
I have found adaptive thresholding to be very useful, you can read all about it by searching for "OpenCV adaptiveThreshold". Also, if you have an image with very little in between light and dark elements, you can use Otsu's Binarization. This automatically determines the threshold values based on the histogram of the grayscale image.

Francis Li · Answer 2 · 2018-07-07T20:07:26.373

This Q&A thread seems to consistently be one of the top search hits for the topic of OCR on iOS, but is fairly out of date, so I thought I'd post some additional resources that might be useful that I've found as of the time of writing this post:

Vision Framework
https://developer.apple.com/documentation/vision
As of iOS 11, you can now use the included CoreML-based Vision framework for things like rectangle or text detection. I've found that I no longer need to use OpenCV with these capabilities included in the OS. However, note that text detection is not the same as text recognition or OCR so you will still need another library like Tesseract (or possibly your own CoreML model) to translate the detected parts of the image into actual text.

SwiftOCR
https://github.com/garnele007/SwiftOCR
If you're just interested in recognizing alphanumeric codes, this OCR library claims significant speed, memory consumption, and accuracy improvements over Tesseract (I have not tried it myself).

ML Kit
https://firebase.google.com/products/ml-kit/
Google has released ML Kit as part of its Firebase suite of developer tools, in beta at the time of writing this post. Similar to Apple's CoreML, it is a machine learning framework that can use your own trained models, but also has pre-trained models for common image processing tasks like Vision Framework. Unlike Vision Framework, this also includes a model for on-device text recognition of Latin characters. Currently, use of this library is free for on-device functionality, with charges for using cloud/SAAS API offerings from Google. I have opted to use this in my project, as the speed and accuracy of recognition seems quite good, and I also will be creating an Android app with the same functionality, so having a single cross platform solution is ideal for me.

ABBYY Real-Time Recognition SDK
https://rtrsdk.com/
This commercial SDK for iOS and Android is free to download for evaluation and limited commercial use (up to 5000 units as of time of writing this post). Further commercial use requires an Extended License. I did not evaluate this offering due to its opaque pricing.

Here's a related Q&A thread: https://stackoverflow.com/questions/44533148/converting-a-vision-vntextobservation-to-a-string — Francis Li, Jun 23 '18 at 20:03

score 4 · Answer 3 · answered Sep 30 '13 at 18:52

4

'Real time' is just a set of images. You don't even need to think about processing all of them, just enough to broadly represent the motion of the device (or the change in the camera position). There is nothing built into the iOS SDK to do what you want, but you can use a 3rd party OCR library (like Tesseract) to process the images you grab from the camera.

answered Sep 30 '13 at 18:52

Wain

118,658
15
128
151

So you'd take a screenshot every second and process it? – LinusGeffarth Sep 10 '17 at 17:12
Not a screenshot, as we're talking about the view from the camera. Anyway, there is suitable API provided for interacting with the camera like this (see other answers). The processing frequency depends on what you're trying to achieve, user testing will tell you the best rate. @LinusGeffarth – Wain Sep 11 '17 at 11:08
1

Maybe this would help - https://medium.com/flawless-app-stories/vision-in-ios-text-detection-and-tesseract-recognition-26bbcd735d8f – Ashutosh Shukla Oct 03 '19 at 13:44

nbvikingsidiot001 · Answer 4 · 2014-07-30T23:50:01.267

4

I would look into Tesseract. It's an open source OCR library that takes image data and processes it. You can add different regular expressions and only look for specific characters as well. It isn't perfect, but from my experience it works pretty well. Also it can be installed as a CocoaPod if you're into that sort of thing.

If you wanted to capture that in real time you might be able to use GPUImage to catch images in the live feed and do processing on the incoming images to speed up Tesseract by using different filters or reducing the size or quality of the incoming images.

edited Jul 30 '14 at 23:50

answered Jul 30 '14 at 21:45

nbvikingsidiot001

41
4

Sounds interesting, I’ll take a look – boliva Jul 31 '14 at 22:08

score 4 · Answer 5 · answered Feb 25 '15 at 03:22

4

There's a project similar to that on github: https://github.com/Devxhkl/RealtimeOCR

answered Feb 25 '15 at 03:22

hungrxyz

745
11
20

iOS: Real Time OCR on top of live camera feed (similar to iTunes Redeem Gift Card)

5 Answers5