You can find the Tesseract JS Wrapper that I am referring to here.
What we want to accomplish:
- Upload a photo of a printed document
- Turn that photo into text
Things done to setup so far:
npm install tesseract.js
Here is our code:
HTML
<input id="myFileInput" type="file" accept="image/*;" capture="camera">
<img id="pic" src="rec.jpg">
JS
<script src="http://tenso.rs/tesseract.js"></script>
<script type="text/javascript">
var img = document.getElementById("pic");
Tesseract
.recognize( img, {
progress: show_progress} )
.then( display )
</script>
What's happening in the Console:
"Uncaught ReferenceError: show_progress is not defined"
"hallo",
"pre-main prep time: 67 ms",
As you can see, we've abandoned the photo upload feature for the moment, until we can figure out how to get tesseract.js to work for a single, pre-provided jpg. Eventually, we hope to add this functionality.
Any help would be greatly appreciated, we're doing this for fun and are mainly seeking a simple (but effective) means of doing OCR with JavaScript. If you have another suggestion, please let us know!