tesseract (.NET) + searchable PDF. How to apply?

Question

I have the need to develop a system that turns an image into a searchable PDF. How is a school work i need something with open source After much research I found tessnet2 (tesseract) and I can remove a text the image in tiff format. But how to convert this information into a PDF? Attention : I need to keep the file structure.

I need a direction to proceed with my research. Someone help me please.

thank you

I guess to be able to do this you would need an OCR library that would do the job for you. It is a little too complicated to be able to discuss on QnA site. — Shakti Prakash Singh, Nov 29 '13 at 13:22
I suggest using: [link](http://www.codeproject.com/Articles/196168/Contour-Analysis-for-Image-Recognition-in-C) just like I do myself for this type of work. Code can be learned to recognize new contour from both scans as Fonts. I use it myself for license plate detection. — online Thomas, Nov 29 '13 at 13:58
user2754599 - As I understand it would help me to detect the text, great! But how to convert to searchable pdf? — msantiago, Nov 29 '13 at 14:43

score 2 · Accepted Answer · edited Sep 20 '19 at 20:58

2

There is a couple of .NET hOCR-to-PDF libraries that you may want to check out at Tesseract 3rdParty page.

edited Sep 20 '19 at 20:58

Adam Plocher

13,994
6
46
79

answered Nov 29 '13 at 17:50

nguyenq

8,212
1
16
16

Already being very useful, have any examples of how to apply on windows? – msantiago Nov 29 '13 at 18:26
[hOcr2Pdf.NET](http://hocrtopdf.codeplex.com/documentation) site has some code example. You can use [Tesseract 3.x .NET wrapper](https://github.com/charlesw/tesseract) to output hOCR strings to be used as input to the library. – nguyenq Nov 30 '13 at 00:22

tesseract (.NET) + searchable PDF. How to apply?

1 Answers1