10

I have a requirement to split a large pdf document into smaller files based on the content of the file. We use BCL easyPDF to manipulate pdf files. easyPDF can split pdf documents based on a page number, but it cannot split the document based on the file content. Also it does not have a search function (as far as I can tell, if I am wrong please someone let me know.) to determine the location of the content.

Now can someone tell me how I can find the location of text in a pdf file using .net?

Thanks

desi
  • 793
  • 2
  • 7
  • 8
  • 2
    yes but it should/is a community where we can help people who may be still learning the ins and outs of a language or protocol. We can try to point them in the right direction. – Brian May 03 '12 at 18:24
  • Isn't PDF a sort of binary file? You cannot just parse it as text. A library is required – Alex Jan 18 '17 at 16:51
  • 4
    I start out my year with my usual complaint. Why is this off topic ( I know the rules say it is) but its very useful, many of the preserved, 'best' questions (which you cannot find now I see) are of this nature. They represent the accumulated advice (aka wisdom) of many experienced devs – pm100 Jan 04 '19 at 00:36

3 Answers3

3

You might try Docotic.Pdf library for your task.

The library can extract text from PDFs (with or without formatting).

Or you could just retrieve a collection of words with their bounding rectangles from PDFs. This should help you to find location of the text in a file.

Disclaimer: I work for the vendor of the library.

Bobrovsky
  • 13,789
  • 19
  • 80
  • 130
2

You need a PDF library in .NET such as iText.Net.

ToolmakerSteve
  • 18,547
  • 14
  • 94
  • 196
Pablo Santa Cruz
  • 176,835
  • 32
  • 241
  • 292
1

take a look at this question. there are links to some libraries that may satisfy your requirements

How to programatically search a PDF document in c#

Community
  • 1
  • 1
Brian
  • 2,229
  • 17
  • 24