9

I have a problem where I have to read the time of recording from the video recorded by a surveillance camera.

The time shows up on the top-left area of the video. Below is a link to screen grab of the area which shows the time. Also, the digit color(white/black) keeps changing during the duration of the video.

alt text http://i55.tinypic.com/2j5gca8.png

Please guide me in the direction to approach this problem. I am a Java programmer so would prefer an approach through Java.

EDIT: Thanks unhillbilly for the comment. I had looked at the Ron Cemer OCR library and its performance is much below our requirement.

Since the ocr performance is less than desired, I was planning to build a character set using the screen grabs for all the digits, and using some image/pixel comparison library to compare the frame time with the character-set which will show a probabilistic result after comparison.

So I was looking for a good image comparison library(I would be OK with a non-java library which I can run using the command-line). Also any advice on the above approach would be really helpful.

stressed_geek
  • 2,118
  • 8
  • 33
  • 45
  • I can't help you with the screen grab but take a look at question regarding Java OCR http://stackoverflow.com/questions/1813881/java-ocr-implementation – David J. Liszewski Dec 21 '10 at 20:19

4 Answers4

6

It doesn't seem like you need a full blown OCR here.
I presume that the numbers are always in the same position in the image. You only expect digits 0-9 at each of the know positions (in either black or white).
A simple template matching at each position with each of the digits (you'll have 20 templates for the 10 digits at each color) is very fast (real-time) and should give you very accurate results.

Adi Shavit
  • 16,743
  • 5
  • 67
  • 137
  • That is exactly the approach I have been trying out, but haven't been able to find a suitable Java library for this simple template matching. Can you please suggest a Java library appropriate for this task. – stressed_geek Jan 09 '11 at 19:50
  • 1
    You might want to check this link:http://stackoverflow.com/questions/2407113/open-source-image-processing-lib-in-java – Adi Shavit Jan 10 '11 at 08:03
  • If you have the image you should be able to pretty easily write yourself a template matching solution by subtracting each template image from the source image. You could use your template as a mask to only consider differences that are valid in the template which would effectively ignore the background. – jodag Sep 20 '13 at 20:14
1

What format is the source in (vhs, dvd, stills)? It's possible that the time stamp is encoded in the data.

Update with more detail

While I completely understand the desire to have an automated end-to-end process (especially if you're selling this app as opposed to creating an in-house tool), it'd be more efficient to have someone manually enter the start time for each video (even if there are hundreds of them ) then to spend weeks of coding getting this to work automatically.

What I'd do (failing a simple, very-fast-to-implement, super-accurate OCR solution which I don't believe exists):

Create a couple of database tables, like

video           video_group
-------         -----------
id              id
filename        title
start_time      date_created
group_id        date_modified
date_created    date_deleted
date_modified
date_deleted

video_group might contain

id| title
-----------
1 | Unassigned
2 | 711 Mockingbird @ 75
3 | Kroger storage room

video would be prepopulated with the video filenames by an import script. Initially assign everything a group_id of 1 (Unassigned)

Create a simple Winforms or WPF app (pardon my ASCII art):

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
|  Group: [=========]\/ [New group...]                            |
|                                                                 |
|  File:  [=========]\/                                           |
|                                                                 |
|  Preview                                                        |
|  |--------------------------------------| [Next Video]          |
|  | (first frame of selected video here) | [Prev]                |
|  |                                      |                       |
|  |                                      |                       |
|  |                                      |                       |
|  |--------------------------------------|                       |
|  Start Time                                                     |
|  [(enter start time value here as displayed on preview frame)]  |
|                                                                 |
|  [Update]                                                       |
-------------------------------------------------------------------

A user (anybody could do this - secretary, janitor, even a recent CS graduate). All they have to do is read the time from the preview frame, type it into the Start Time field, and Click "update" or "Next" to update the database and move on to the next one. Keep the Group selection from one video to the next unless the user changes it.

Assuming it takes the user 30 seconds to read, type and click next, They could complete 100-150 videos in an hour (Call it 75 for a more realistic estimate). And, interns are a lot cheaper than developer time.

If you really have "hundreds" of videos, it'll still be faster to do it this way than to screw around with OCR. If the OCR works for the most part, you'll most likely need to have someone manually inspect everything to see if the results are correct. which begs the question, why bother with the OCR?

3Dave
  • 28,657
  • 18
  • 88
  • 151
  • The videos are in digital format in mp4 container. There is a time-stamp encoded but it always has a variable difference from the time showing on the video. So we are not able to use that and instead have to read it from the video itself. – stressed_geek Dec 23 '10 at 05:28
  • 2
    In that case, you only need to read the time from the first frame, then add number of frames*1/fps to get the time at any point in the video. Might be more efficient to just read the first frame time stamp manually and calculate the rest. – 3Dave Dec 23 '10 at 05:39
  • That is what we are doing at the moment, but since we have hundreds of videos, we want to automate the process. – stressed_geek Dec 23 '10 at 05:44
  • Moreover, we are merging videos from 4 different cameras all of which start at slightly different times to form a single video, so it is important for us to be able to read the time of each video before we run the video merging script. – stressed_geek Dec 23 '10 at 05:49
1

Java OCR will work perfectly for your situation (Ron Cemer here). All you need to do is remove the background image, or make it always be less than 50% white, so that the white characters will be white and the background will be black when the image is converted to monochrome.

Train JavaOCR on the font, extract that rectangular region from the image, remove the background and you're off and running.

I suggest an algorithm which looks at r,g,b and sets everything to black where r,g,b are not exactly the same values. That will leave only pixels which are perfect shades of gray. Since the image is color and the digits are monochrome, that will leave the digits and some dust.

JavaOCR wants to see black characters on a white background, so once you've done the above, you'll also need to invert the monochrome image (white = black and vice-versa). Then run that through the JavaOCR library, passing it reference samples of all of the characters you expect it to recognize, and your problem should be (at least mostly) solved.

Ron Cemer
  • 11
  • 1
0

Try Tesseract from Google, there are a couple of JNI wrappers available. Ensure to read the FAQ to only pull digits.

Steve-o
  • 12,678
  • 2
  • 41
  • 60