3

I am looking for a way in which it is possible to extract the left side of the image (the non ascii characters) here, reliably, using OCR. I have a number of images in a similar format, also showing hex data, that i would like to extract.

Would anyone be able to reccomend a way to extract the text from these images? enter image description here

enter image description here

Ideally, the output for first line would be :

0005C850 00 00 00 00 etc.

and output for fourth line would be something like:

0005C8AA ED 93 A7 8E etc.

dipl0
  • 1,017
  • 2
  • 13
  • 36
  • Could you give more details about your problem? What is the context? – cnexans Jun 28 '20 at 01:01
  • 1
    I like this question, but am voting to close because it seems to be asking for a tool recommendation, which is off topic. – bishop Jun 28 '20 at 01:01
  • I wouldn't say that's explicitly off topic. I am looking for a programmatic way or tool that is able to extract the text from these images. – dipl0 Jun 28 '20 at 01:03
  • use https://github.com/tesseract-ocr/tesseract to OCR Full Image and apply some condition to return only your needed token – Isaac Be Jul 05 '20 at 18:37

1 Answers1

4

A legendary program has been written that does this. - https://github.com/eighttails/ProgramListOCR . For Windows systems, i ran on a VM.

Was hard to find.

First you want to convert your image to 400% of the size:

convert -resize 400% source.png source.png

Make it grayscale:

convert source.png -colorspace Gray destination.png

Change it into a tiff file:

convert destination.png destination.tiff

You then process it using this software.

dipl0
  • 1,017
  • 2
  • 13
  • 36