20

I'm using iTextSharp to read the contents of PDF documents:

PdfReader reader = new PdfReader(pdfPath);
using (StringWriter output = new StringWriter())
{
    for (int i = 1; i <= reader.NumberOfPages; i++)
        output.WriteLine(PdfTextExtractor.GetTextFromPage(reader, i, new SimpleTextExtractionStrategy()));

    reader.Close();
    pdfText = output.ToString();
}

99% of the time it works just fine. However, there is this one PDF file that will sometimes throw this exception:

PDF header signature not found. StackTrace: at
iTextSharp.text.pdf.PRTokeniser.CheckPdfHeader() at
iTextSharp.text.pdf.PdfReader.ReadPdf() at
iTextSharp.text.pdf.PdfReader..ctor(String filename, Byte[]> ownerPassword) at
Reader.PDF.DownloadPdf(String url) in

What's annoying is that I can't always reproduce the error. Sometimes it works, sometimes it doesn't. Has anyone encountered this problem?

KyleMit
  • 30,350
  • 66
  • 462
  • 664
broke
  • 8,032
  • 16
  • 54
  • 83

4 Answers4

24

After some research, I've found that this problem relates to either a file being corrupted during PDF generation, or an error related to an object in the document that doesn't conform to the PDF standard as implemented in iTextSharp. It also seems to happen only when you read from a PDF file from disk.

I have not found a complete solution to the problem, but only a workaround. What I've done is read the PDF document using the PdfReader itextsharp object and see if an error or exception happens before reading the file in a normal operation.

So running something similar to this:

private bool IsValidPdf(string filepath)
{
    bool Ret = true;

    PdfReader reader = null;

    try
    {
        reader = new PdfReader(filepath);
    }
    catch
    {
        Ret = false;
    }

    return Ret;
}
  • 4
    It took me a while, but I finally figured out that the file was indeed corrupted. The blame goes to the website that created the PDF, and not a bug with iTextSharp. Thanks for taking the time to answer my question. – broke Jun 05 '12 at 18:27
  • 2
    I can confirm this can occur if you load the reader from a steam as well as from disk:) – JoshBerke May 21 '13 at 20:33
  • 1
    Turns out I was looking at the wrong file in my case. The filename was referring to one of the assets (images) I was previously using, i.e. a jpg is not a pdf, doh :) , so it was in fact a corrupted PDF (or not one at all). Thanks - got me on the right path. – Anthony Horne Jul 23 '15 at 09:25
  • 1
    This answer helped me a lot today. I recommend to put a `finally` block with `reader.Close()` to prevent files beeing locked by the process – Guilherme Batista Nov 08 '18 at 20:35
22

I found it was because I was calling new PdfReader(pdf) with the PDF stream position at the end of the file. By setting the position to zero it resolved the issue.

Before:

// Throws: InvalidPdfException: PDF header signature not found.
var pdfReader = new PdfReader(pdf);

After:

// Works correctly.
pdf.Position = 0;
var pdfReader = new PdfReader(pdf);
Bern
  • 7,808
  • 5
  • 37
  • 47
0

In my case, it was because I was calling a .json file, and iTextSharp only accepts pdf file obviously.

0

There is the possibility that you are opening the file with another method or program as was my case. Verify that nothing is working with your file, you can also use the resource monitor to verify which processes are working on your file.

  • This does not provide an answer to the question. Once you have sufficient [reputation](https://stackoverflow.com/help/whats-reputation) you will be able to [comment on any post](https://stackoverflow.com/help/privileges/comment); instead, [provide answers that don't require clarification from the asker](https://meta.stackexchange.com/questions/214173/why-do-i-need-50-reputation-to-comment-what-can-i-do-instead). - [From Review](/review/late-answers/30488476) – KMR Dec 03 '21 at 03:56