I want to let tesseract ORC run over an image file, to scan the content.
The problem seems to be that tesseract not only requires TIFF, but it also requires the tiff file to be in a certain format.
With just a normal tiff file, I get:
root@toshiba:~/Desktop# tesseract crap.tif crap.txt
Tesseract Open Source OCR Engine
check_legal_image_size:Error:Only 1,2,4,5,6,8 bpp are supported:32
Segmentation fault
So far I have managed to find an antidote.
It consists of using GIMP, going to Image > Mode > Indexes, and setting "Generate Optimum Palette", "maximum number of colors" to 256.
then I have to do one more trick before "Save As".
Going to Layer > Transparency > Remove Alpha Channel,
which will remove transparency, because TIF images cannot have transparency.
Now the problem is my input image comes from C#, and is preprocessed with AFORGE.NET image analysis filters.
I have also found a .NET port of LibTiff, and an example of how to write an image with color palette here:
http://bitmiracle.com/libtiff/help/create-tiff-with-palette-(color-map).aspx
But I don't know how to get the data from the source tiff (the one with the wrong palette) to the target tiff (with the correct palette format)...