0

I have to replace number "14-1" into "10-2". I am using following iText code but getting following type cast error. Can any one help me by modifying the program and remove the casting issue:

I have many PDF's where i have to replace the numbers at same location. I also need to understand it logically to how to do this:

using System;
using System.IO;
using System.Text;
using iTextSharp.text.io;
using iTextSharp.text.pdf;

using System.Windows.Forms;

namespace iText5
{
    public partial class Form1 : Form
    {
        public Form1()
        {
            InitializeComponent();
        }

        public const string src = @"D:\test1\A.pdf";
        public const string dest = @"D:\test1\ENV1.pdf";

        private void button1_Click(object sender, EventArgs e)
        {
            FileInfo file = new FileInfo(dest);
            file.Directory.Create();
            manipulatePdf(src, dest);
        }

        public void manipulatePdf(String src, String dest)
        {

        PdfReader reader = new PdfReader(src);
        PdfDictionary dict = reader.GetPageN(1);

        PdfObject obj = dict.GetDirectObject(PdfName.CONTENTS);

        PRStream stream = (PRStream)obj;

        byte[] data = PdfReader.GetStreamBytes(stream);
        string xyz = Encoding.UTF8.GetString(data);

        byte[] newBytes = Encoding.UTF8.GetBytes(xyz.Replace("14-1", "10-2"));
        stream.SetData(newBytes);

        PdfStamper stamper = new PdfStamper(reader, new FileStream(dest, FileMode.Create));
        stamper.Close();
        reader.Close();

        }

    }
}
Keshav
  • 1
  • 1
  • 2
  • First of all, unless your pdfs are internally very simple, your code won't work, for numerous reasons. The error at hand occurs because you assume the page content to be a stream. It can alternatively also be an array of streams... in case of the pdf at hand it is an array! – mkl Aug 04 '18 at 13:59

1 Answers1

0

This is a problem:

PdfDictionary dict = reader.GetPageN(1);
PdfObject obj = dict.GetDirectObject(PdfName.CONTENTS);
PRStream stream = (PRStream)obj;

First you get a page dictionary. That page dictionary has a /Contents entry. If you read the PDF standard (ISO 32000), then you see that the value of the /Contents entry can be either a stream, or an array. You assume that it's always a stream. In some cases, your code will work, but in cases where the value of the /Contents entry is an array of references to a series of streams, you will get a class cast error (for the obvious reason that an array of streams is not the same as a stream).

I think that you want to do something like this:

byte[] data = reader.GetPageContent(i);
string xyz = PdfEncodings.ConvertToString(data, PdfObject.TEXT_PDFDOCENCODING);
string abc = xyz.Replace("14-1", "10-2");                 
reader.SetPageContent(i, PdfEncodings.ConvertToBytes(abc, PdfObject.TEXT_PDFDOCENCODING));

However, that's a very bad idea, because of the reasons explained in the answers to these questions:

You are making the assumption that you will find a literal string with value "14-1" in the content. That might be true for simple PDF documents, but in many cases the appearance of "14-1" on a page (that you can read with your eyes) doesn't mean the string "14-1" is present as such in the content (that you extract with GetPageContent). That string could be part of an XObject, or the syntax to render "14-1" could be constructed in such a way that xyz.Replace("14-1", "10-2") won't change xyz in any way.

Bottom line: PDF is not a format for editing. A page in a PDF file consists of content that is added at absolute positions. The content on a page doesn't reflow if you change it (e.g. the existing content won't move to the next line or to the next page if you add extra content). Instead of editing a PDF document, you should edit the source that was used to create the document, and then create a new PDF from that source.

Important: you are using an old version of iText. We abandoned the name iTextSharp more than two years ago in favor of iText for .NET. The current version of iText is iText 7.1.2; see Nuget: https://www.nuget.org/packages/itext7/

Many people think that iText 5.5.13 is the latest version. That assumption is wrong. iText 5 has been discontinued and is no longer supported. The recent 5.5.x versions are maintenance releases for paying customers who can't migrate to iText 7 right away.

Bruno Lowagie
  • 75,994
  • 9
  • 109
  • 165
  • Thanks Bruno, your modified code helped me to replace digits at same location but instead of 10-2 its displaying a crossed box for digits 0 and at 2 in PDF. When copied its extracting the correct 10-2. I also checked fonts of replaced digits in Acrobat with TouchUp Object tool and fonts are same Futura-Std-bold in modified pdf but rendering has changed. Any suggestion pls. – Keshav Aug 05 '18 at 02:50
  • Also, if text replacement is not good idea because of its complications, can i patch a new number at same location and hide the old one behind it. Is that something i can do. As i don't have source of the pdf's i have only option left to find a way to replace, patch or modify it in some way... – Keshav Aug 05 '18 at 03:22
  • The font that was used to display `"14-1"` is a subset of a much larger font. That subset knows how to render `1`, `-` and `4`; it does not know how to render `0` and `2`. If it did, the file would be bloated with information that isn't necessary to render the original content. If you want to change the content, adding characters that aren't known to the subset, you need the original font program and add the missing characters to the subset of the font in the PDF. That is way too complex for what you want to achieve; you may not even have access to the Futura-Std-Bold font program anymore. – Bruno Lowagie Aug 05 '18 at 11:10
  • iText 7 has an add-on called pdfSweep that allows you to find 14-1 and remove it. You can use the location information to add another number in its place. However: you keep talking about **iTextSharp** which is a name that we abandoned over two years ago in favor of **iText for .NET**. Your code shows that you are using iText 5 or earlier, whereas the current version is iText 7.1.2. iText 5 is no longer supported. New iText 5.5.x versions are *maintenance releases* for paying customers. They were created in the context of their support contract. – Bruno Lowagie Aug 05 '18 at 11:17
  • Thanks Bruno, Let me work on new iText 7 version and pdfSweep to mange this issue. I hope i will be able to do it with your help. – Keshav Aug 05 '18 at 16:14