7

Is there any API in C# or .net to edit pdf documents?

Like I need to retrieve particular text and replace it with my own text.

Thanks
nRk

Preet Sangha
  • 64,563
  • 18
  • 145
  • 216
nRk
  • 1,251
  • 7
  • 24
  • 50

2 Answers2

7

This is not possible (in a clean & reliable way), from iTextSharp tutorial:

You can't 'parse' an existing PDF file using iText, you can only 'read' it page per page. What does this mean? The pdf format is just a canvas where text and graphics are placed without any structure information. As such there aren't any 'iText-objects' in a PDF file. In each page there will probably be a number of 'Strings', but you can't reconstruct a phrase or a paragraph using these strings. [...] You can't edit an existing PDF document, by saying: for instance replace the word Louagie by Lowagie. To achieve this, you would have to know the exact location of the word Louagie, paint a white rectangle over it and paint the word Lowagie on this white rectangle. Please avoid this kind of 'patch' work. Do your PDF editing with an Adobe product.

  • Thanks RC, As I am new to editing PDF, but Can I retrieve particulat text and replace with my own Text and save it as new PDF document? – nRk Nov 23 '09 at 06:51
  • @nrk, no as stated before. You should read the link I provided if you need more info. –  Nov 23 '09 at 18:42
  • Really annoying answer. There is no proof provided that this cannot be done. If iText cannot do it - doesn't mean it cannot be done. – Edward Olamisan May 07 '13 at 20:00
  • @EdwardOlamisan not a proof, but have you ever seen those copier that can create and send you a PDF? They just scan the input (no OCR) stack all images in a PDF and that's it. Now imagine you have one image from that PDF, do you think you can, whatever the input is, in a reliable way, replace one word on that image (what is a word in a bunch of pixels btw)? **If you can, re-capcha might be interested ;)** –  May 08 '13 at 11:09
  • @RC Open a PDF in Adobe Acrobat Pro and you will be able to edit the document: add/modify text, add/modify images etc. – Edward Olamisan May 08 '13 at 13:54
  • @EdwardOlamisan sure, you can photoshop images to "replace text", I'm pretty sure acrobat **cannot** do anything when you want to replace `portion` from [the image here](http://www.google.com/recaptcha/learnmore) –  May 08 '13 at 14:25
  • @RC The PDF stream object contains typed objects. You would edit the text objects. And you would edit the image object like you edit an image. I am not suggestion they should OCR an image and let you edit it as text, which they could, too. But they could at least use the basic objects defined in the PDF specification. – Edward Olamisan May 08 '13 at 15:54
2

there are a number of 3rd party libraries (such as Aspose(paid tool)), but there's not really a native API.

That said, PDF is an open-standard so you can get how the file is structured and parse it on your own.

Tk1993
  • 501
  • 8
  • 21
Stephen Wrighton
  • 36,783
  • 6
  • 67
  • 86