1

I have some code that combines a few pages of acro forms (with acrofields in tact) and then at the end writes some JS to the entire document.

It is the PdfReader in the function adding the JS that is taking extremely long to instantiate (about 12 seconds for a 1MB file).

Here is the code (pretty simple):

public static byte[] AddJavascript(byte[] document, string js)
    {
        PdfReader reader = new PdfReader(new RandomAccessFileOrArray(document), null);
        MemoryStream msOutput = new MemoryStream();
        PdfStamper stamper = new PdfStamper(reader, msOutput);
        PdfWriter writer = stamper.Writer;

        writer.AddJavaScript(js);

        stamper.Close();
        reader.Close();

        byte[] withJS = msOutput.GetBuffer();
        return withJS;
    }

I have benchmarked the above and the line that is slow is the first one. I have tried reading it from a file instead of memory and tried using a MemoryStream instead of the RandomAccessFileOrArray. Nothing makes it any faster.

If I add JS to a single page document, it is very fast. So my thought is that the code that combines the pages is somehow making the PDF slow to read for the PdfReader.

Here is the combine code:

public static byte[] CombineFiles(List<byte[]> sourceFiles)
    {
        MemoryStream output = new MemoryStream();

        PdfCopyFields copier = new PdfCopyFields(output);

        try
        {
            output.Position = 0;

            foreach (var fileBytes in sourceFiles)
            {
                PdfReader fileReader = new PdfReader(fileBytes);

                copier.AddDocument(fileReader);
            }
        }
        catch (Exception exception)
        {
            //throw
        }
        finally
        {
            copier.Close();
        }

        byte[] concat = output.GetBuffer();

        return concat;
    }

I am using PdfCopyFields because I need to preserve the form fields and so cannot use the PdfCopy or PdfSmartCopy. This combine code is very fast (few ms) and produces working documents. The AddJS code above is called after it and the PdfReader open is the slow piece.

Any ideas?

slavoo
  • 5,798
  • 64
  • 37
  • 39
Wbmstrmjb
  • 105
  • 14
  • 1
    Can you share that pdf? I'm used to iText PdfReader opening files a few MB in size way faster. So either your pdf is somehow special or something in your environment. – mkl Jun 06 '14 at 11:45
  • 2
    Not related to your problem but you will occasionally create corrupt PDFs if you use `GetBuffer()`. Instead you should use `ToArray()`. See http://stackoverflow.com/a/5119739/231316 – Chris Haas Jun 06 '14 at 13:06
  • I'm unable to reproduce speed issues using C#. Can you either post your PDF and JavaScript or a representative sample version of both that has the same problems? – Chris Haas Jun 06 '14 at 13:32
  • @Chris Haas: THANK YOU! The GetBuffer/ToArray was the issue! Went from 12 sec to 4ms! Please write as an answer so I can give you credit. – Wbmstrmjb Jun 06 '14 at 16:36

2 Answers2

2

(comment converted to answer)

Using GetBuffer() on a MemoryStream will occasionally create corrupt PDFs. Instead, ToArray() should always be used. More information on this can be found here.

Community
  • 1
  • 1
Chris Haas
  • 53,986
  • 12
  • 141
  • 274
1

As documented, PdfCopyFields is indeed slow. However, PdfCopyFields is either deprecated or about to be deprecated in favor of PdfCopy. There are two examples in the sandbox that show how it's done: MergeForms (copying forms without renaming the fields) and MergeForms2 (copying forms after renaming the fields). This is what MergeForms looks like:

Document document = new Document();
PdfCopy copy = new PdfCopy(document, new FileOutputStream(filename));
copy.setMergeFields();
document.open();
for (PdfReader reader : readers) {
    copy.addDocument(reader);
}
document.close();
for (PdfReader reader : readers) {
    reader.close();
}

Note that you need a recent iText version to run this code.

Bruno Lowagie
  • 75,994
  • 9
  • 109
  • 165
  • Thank you Bruno. But the PdfCopyFields isn't that slow for me. 200ms to combine 7 pages (I am ok with that). The problem is that instantiating the new PdfReader in the AddJavascript function is taking about 12-14 seconds to load that 7 page combined file. This is very odd. – Wbmstrmjb Jun 06 '14 at 09:56
  • That's indeed odd. Unfortunately, I can't reproduce this. I only work with the Java version. I'm not that familiar with the C# port. iText is continuously ported to iTextSharp by developers paid by the iText Group company. Hence, you won't get much free support for iTextSharp. The paid developers are only active on the paid support system (which makes sense). – Bruno Lowagie Jun 06 '14 at 11:39