How to replace text in a PDF with C#?

Question

I saw a lot of solutions in here but none are clear or good answers.

Here is my simple question, hoping with a straight answer.

I have a PDF file (a template) which is created having text something like this:

{FIRSTNAME} {LASTNAME} {ADDRESS} {PHONENUMBER}

is it possible to have C# code that replace these templates with a text of my choice?

No fields, no other complex stuff.

Is there any Open source library helping me achieve that?

Do you have some code you can share with us, might make it easier? — , Aug 22 '11 at 10:02
This other question may be similar : [Edit pdf in c#][1] [1]: http://stackoverflow.com/questions/1781208/is-there-any-api-in-c-or-net-to-edit-pdf-documents — Michaël, Aug 22 '11 at 10:05
You can use http://sourceforge.net/projects/itextsharp/ , there are easy to follow (but a bit outdated) tutorials out there: http://asp-net-whidbey.blogspot.com/2006/04/generating-pdf-files-with-itextsharp.html — Andreas, Aug 22 '11 at 10:09
Hi, this put me on level 0, iText needs fields! I do not have fields and I can not create one! — Data-Base, Aug 22 '11 at 10:16
@Sean: If you are already editing the title to fix the grammar, please also fix the grammar/spelling in the content (like Shadow Wizard did now). — Paŭlo Ebermann, Aug 22 '11 at 11:06

albercik1985 · Answer 1 · 2012-03-06T10:49:17.240

This thread is dead, however I'm posting my solution for other lost souls that might face this problem in the future. Unfortunately my company doesn't allow posting code online so I'll describe the solution :).

So basically what you have to do is use PdfSharp and modify this sample to replace text in stream, but you must take into account that text may be split into many parentheses (convert stream to string to see what the format is).

Then, with code similar to this sample traverse through source pdf page by page and modify current page by searching for PdfContent items inside PdfReference items and replacing text in content's stream.

Frank Rem · Answer 2 · 2018-10-18T15:24:55.000

The 'problem' with PDF documents is that they are inherently not suitable for editing. Especially ones without fields. The best thing is to step back and look at your process and see if there is a way to replace the text before the PDF was generated. Obviously, you may not always have this freedom.

If you will be able to replace text, then you should be aware that there will be no automatic reflow of the text following the replaced text. Given that you are fine with that, then there are very few solutions that allows you to replace text.

I know that you are looking for an OpenSource solution so I feel reluctant to offer you a commercial solution. We offer one called PDFKit.NET. It allows you to extract all content on a page as so-called shapes (text, images, curves, etc.). See method Page.CreateShapes in the type reference. You can then programmatically navigate and edit this structure of shapes and then write it back to a PDF again.

Here it is: http://www.tallcomponents.com/pdfkit

Disclosure: I am the founder of TallComponents, vendor of this component

if there is not Open-Source solution, then I will have to search for a commercial one :-) — Data-Base, Aug 22 '11 at 13:29
Indeed, the vast majority of PDFs use subsetted fonts. That means only the necessary font glyphs are embedded. For example, if your existing PDF doesn't contain the letter 'A', you can't add it. Unless, of course, the original font file still exists on your computer. Otherwise you might have to find a similar enough font. — Tamas Demjen, Aug 27 '11 at 02:35

score 4 · Answer 3 · answered Nov 05 '15 at 16:55

For simple text replace use iTextSharp library. The code that replace one string with another is below. Note that this will replace only simple text and may not work in all cases.

    //using iTextSharp.text.pdf;
    void VerySimpleReplaceText(string OrigFile, string ResultFile, string origText, string replaceText)
    {
        using (PdfReader reader = new PdfReader(OrigFile))
        {
            for (int i = 1; i <= reader.NumberOfPages; i++)
            {
                byte[] contentBytes = reader.GetPageContent(i);
                string contentString = PdfEncodings.ConvertToString(contentBytes, PdfObject.TEXT_PDFDOCENCODING);
                contentString = contentString.Replace(origText, replaceText);
                reader.SetPageContent(i, PdfEncodings.ConvertToBytes(contentString, PdfObject.TEXT_PDFDOCENCODING));
            }
            new PdfStamper(reader, new FileStream(ResultFile, FileMode.Create, FileAccess.Write)).Close();
        }
    }

Unfortunately this solution does not work even for a very simple PDF. I checked `contentString` and it does not contain at all any text from the PDF. May be you got an updated version? The PDFs I am working with are very simple and also the searched text is unique. — Peter VARGA, Jan 25 '18 at 08:12
PdfObject.TEXT_PDFDOCENCODING does not seem to exist in recent itextsharp versions. — Kyle, Dec 28 '18 at 22:09

score 3 · Accepted Answer · edited May 23 '17 at 12:30

3

As stated in similar thread this is not really possible an easy way. The easier way it seems to be getting a DocX file and using DocX library which allow easy word swapping and then converting your DocX to PDF (using PDF Creator printer or so).

Or use pdf sharp/migradoc to create new documents.

edited May 23 '17 at 12:30

Community

1
1

answered Aug 22 '11 at 10:14

MadBoy

10,824
24
95
156

That is interesting, because my template is actually made with word and saved to PDF! so I can keep it as docx and use it as a template! :-) – Data-Base Aug 22 '11 at 10:27
this one works nicely, but if I have a formated text then it will change the format to the default one! it seems there is a bug in it and I hope they will fix it, but really thanks for posting it here :-) – Data-Base Aug 22 '11 at 13:31
Read thru forum maybe there's already a fix for this. I remember seeing it in there – MadBoy Aug 22 '11 at 13:46
Here's the fix http://docx.codeplex.com/discussions/244425 and also compile binary yourself. There's been quite a few fixes after release of 1.0.0.11 including the one you mention. – MadBoy Aug 22 '11 at 13:49
well, I did that and I compiled it my self (the latest build) and it did not work – Data-Base Aug 22 '11 at 13:50
1

Then post your problem on forum. Maybe Cathal will be able to help out. – MadBoy Aug 22 '11 at 13:59

score 0 · Answer 5 · answered Dec 17 '20 at 11:35

Updating in PDF is hard and dirty. So may be adding a content on top of existing will work for you as well, as it worked for me. If so, here's my primitive, but working solution covering a lot of cases ("covering", indeed):

https://github.com/astef/PatchPdfText

How to replace text in a PDF with C#?

5 Answers5

Linked