As I work with the Anaconda distribution of Python, in my tests on an isolated conda
environment, the OCR is successful with pytesseract
through a Python script, on a test image.
Prerequisites to test:
- install Anaconda and create an env called py3.7.4:
conda create --name py3.7.4
- activate the env with
conda activate py3.7.4
- install pytesseract with
conda install -c conda-forge pytesseract
- create a folder called
Test
and place a jpg file called ocr.jpg with the following sample image:

in the same Test
folder also place a Python script called ocr_test.py
with the following code:
import pytesseract
from PIL import Image
import argparse
parser = argparse.ArgumentParser(
description='perform OCR on image')
parser.add_argument("--path", "-p", help="path for image")
args = parser.parse_args()
print(pytesseract.image_to_string(Image.open(args.path)))
print("done")
The above snippet accepts the image path as a command line argument. The --path
flag must be specified in order to pass the image path as an arg.
Now, in the C# code snippet below, we will:
- launch the
cmd
shell
- navigate to the workingDirectory
Test
folder by specifying the WorkingDirectory
arg for the process.start()
method.
- activate Anaconda with the
anaconda.bat
file(replace the file path as per its location on your computer)
- activate the above conda environment
- call the Python script passing the
imageFileName
as an arg.
C# snippet:
using System.Diagnostics;
using System.Threading;
namespace PyTest
{
class Program
{
static void Main(string[] args)
{
string workingDirectory = @"C:\Test";
string imageFileName = "ocr.JPG";
var process = new Process
{
StartInfo = new ProcessStartInfo
{
FileName = "cmd.exe",
RedirectStandardInput = true,
UseShellExecute = false,
RedirectStandardOutput = false,
WorkingDirectory = workingDirectory
}
};
process.Start();
using (var sw = process.StandardInput)
{
if (sw.BaseStream.CanWrite)
{
// Vital to activate Anaconda
sw.WriteLine(@"C:\Users\xxxxxxx\Anaconda3\Scripts\activate.bat");
Thread.Sleep(500);
// Activate your environment
sw.WriteLine("conda activate py3.7.4");
Thread.Sleep(500);
sw.WriteLine($"python ocr_test.py --path {imageFileName}");
Thread.Sleep(50000);
}
}
}
}
}
If you have followed the above steps, you should receive the following output on executing the C# snippet in Visual Studio:
Output:
Microsoft Windows [Version 10.0.18362.535]
(c) 2019 Microsoft Corporation. All rights reserved.
C:\xxxxxxx\Projects\Scripts>C:\Users\xxxxx\Anaconda3\Scripts\activate.bat
(base) C:\xxxxxx\Projects\Scripts>conda activate py3.7.4
(py3.7.4) C:\xxxxxxx\Projects\Scripts>python ocr_test.py --path ocr.JPG
Introduction
This is a test to see accuracy of Tesseract OCR
Test 1
Test 2
done
Note: I am unable to test with a standalone Python distro but I believe it should work just fine with that too. The key is to pass the image file path as an argument to the Python script too. That way, the image file path passed as argument from C# is treated similarly by Python too. Also, using Image.open() does the following(from the docs):
Opens and identifies the given image file. This is a lazy operation;
this function identifies the file, but the file remains open and the
actual image data is not read from the file until you try to process
the data