I'm able to scan a JPG image using Tesseract, I'm able to scan a regular PDF using ITextSharp and get the text from those. But I can't find a way to either get the text from a scanned PDF with a .PDF extension, or convert a PDF to an image so I can then scan it with Tesseract. Are there any options that I'm missing? Thanks!
Asked
Active
Viewed 697 times
1 Answers
0
Assuming that you have scanned the PDF document. Secondly assuming you have only text in the PDF document. You can generate an image from text from the following method
private Image DrawText(String text, Font font, Color textColor, Color backColor)
{
//first, create a dummy bitmap just to get a graphics object
Image img = new Bitmap(1, 1);
Graphics drawing = Graphics.FromImage(img);
//measure the string to see how big the image needs to be
SizeF textSize = drawing.MeasureString(text, font);
//free up the dummy image and old graphics object
img.Dispose();
drawing.Dispose();
//create a new image of the right size
img = new Bitmap((int) textSize.Width, (int)textSize.Height);
drawing = Graphics.FromImage(img);
//paint the background
drawing.Clear(backColor);
//create a brush for the text
Brush textBrush = new SolidBrush(textColor);
drawing.DrawString(text, font, textBrush, 0, 0);
drawing.Save();
textBrush.Dispose();
drawing.Dispose();
return img;
}
Reference: How to generate an image from text on fly at runtime

Community
- 1
- 1

amaidhassan niazi
- 61
- 1
- 9