60

I need to determine the number of pages in a specified PDF file using C# code (.NET 2.0). The PDF file will be read from the file system, and not from an URL. Does anyone have any idea on how this could be done? Note: Adobe Acrobat Reader is installed on the PC where this check will be carried out.

darkdog
  • 3,805
  • 7
  • 37
  • 47
Tangiest
  • 43,737
  • 24
  • 82
  • 113

8 Answers8

83

You'll need a PDF API for C#. iTextSharp is one possible API, though better ones might exist.

iTextSharp Example

You must install iTextSharp.dll as a reference. Download iTextsharp from SourceForge.net This is a complete working program using a console application.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using iTextSharp.text.pdf;
using iTextSharp.text.xml;
namespace GetPages_PDF
{
  class Program
{
    static void Main(string[] args)
      {
       // Right side of equation is location of YOUR pdf file
        string ppath = "C:\\aworking\\Hawkins.pdf";
        PdfReader pdfReader = new PdfReader(ppath);
        int numberOfPages = pdfReader.NumberOfPages;
        Console.WriteLine(numberOfPages);
        Console.ReadLine();
      }
   }
}
Robert Groves
  • 7,574
  • 6
  • 38
  • 50
darkdog
  • 3,805
  • 7
  • 37
  • 47
  • so are you saying "here's what I recommend, but actually there are betetr ways to do this"? – Mitch Wheat Nov 26 '08 at 11:09
  • 12
    Thank, Darkdog, after looking at PDFLib and iTextSharp, I ended up using iTextSharp: PdfReader pdfReader = new PdfReader(pdfFilePath); int numberOfPages = pdfReader.NumberOfPages; Hope this helps someone facing the same problem. – Tangiest Mar 17 '09 at 14:03
  • Thanks MagicAndi for posting the code. Very useful – lidermin Jul 23 '10 at 21:54
  • @MagicAndi Thank you for posting the code! – Dragos Durlut Feb 06 '12 at 12:34
  • @liang it's one-based. There is no page zero. – Dave Oct 05 '15 at 20:52
  • 4
    It is now iText7 and the code to extract the page count is PdfDocument pdfDoc = new PdfDocument(new PdfReader(fileName)) and then pdfDoc.GetNumberOfPages(); You can get the project from NuGet packages. – Samuel Jan 24 '20 at 19:49
  • Don't forget to dispose the PdfReader: `using(var pdfReader = new PdfReader(ppath)) { ... }` – Bohdan Aug 07 '20 at 09:45
  • You should note that iText license is AGPL which means if you use it, then you need to either buy a commercial license, or publish your source code for free. See https://itextpdf.com/en/blog/technical-notes/how-do-i-make-sure-my-software-complies-agpl-how-can-i-use-itext-free and https://opensource.google/docs/using/agpl-policy/ – BateTech Aug 26 '21 at 14:30
45

This should do the trick:

public int getNumberOfPdfPages(string fileName)
{
    using (StreamReader sr = new StreamReader(File.OpenRead(fileName)))
    {
        Regex regex = new Regex(@"/Type\s*/Page[^s]");
        MatchCollection matches = regex.Matches(sr.ReadToEnd());

        return matches.Count;
    }
}

From Rachael's answer and this one too.

Community
  • 1
  • 1
Barrett
  • 1,094
  • 9
  • 16
  • Barrett, thanks for providing example code. +1 – Tangiest Nov 20 '09 at 00:35
  • 1
    I don't think this will always give the correct count. It also will not work on encrypted PDFs. – Tim B Sep 25 '12 at 18:13
  • @TimB I saved an encrypted PDF and this works on it. – Cristian Lupascu Jan 21 '13 at 08:38
  • @w0lf It seems you are right. The page objects are one of the few things in an encrypted PDF that are readable without the password. – Tim B Jan 21 '13 at 14:17
  • 1
    Didn't work for me - copied and pasted exactly as shown. It returned a value of 216, when the PDF actually had 111 pages. – Paul Apr 10 '14 at 20:30
  • 2
    Works great but slower than the iTextSharp solution. – aloisdg Apr 13 '15 at 09:09
  • 4
    PDF uses versioned objects, and can also include deleted objects if the PDF hasn't been cleaned up, so it is possible to have Page objects that aren't actually linked into the PDF or that have been replaced with a newer version. This is why using a maintained PDF library is a better idea than doing it yourself. – Thomas S. Trias Apr 13 '16 at 21:37
  • I know this is old question, but I was searching for way of getting total count of pages, but I noticed that I must read whole document. I have really large files and I'd like to get page count without reading whole pdf. Can this be done without external dependencies? – Misiu Aug 02 '16 at 11:15
  • I know this is a super old question and answer, but I found this answer on Google a while ago and just came back to it. This answer works but will cause an OutOfMemoryException in big PDFs (i.e. a PDF with 150 images at 300dpi). The iTextSharp answer provided by darkdog works perfectly – Ieuan Oct 27 '16 at 10:12
  • Does this support tif files? – goofyui Mar 04 '18 at 02:52
  • It does not always give correct answer, but most of the time. thanks – MindRoasterMir Feb 27 '22 at 10:02
8

found a way at http://www.dotnetspider.com/resources/21866-Count-pages-PDF-file.aspx this does not require purchase of a pdf library

  • Rachael, finally reviewed this question, and checked out your link. Thanks, one to try next time this problem comes up! +1 – Tangiest Nov 20 '09 at 00:34
4

I have used pdflib for this.

    p = new pdflib();

    /* Open the input PDF */
    indoc = p.open_pdi_document("myTestFile.pdf", "");
    pageCount = (int) p.pcos_get_number(indoc, "length:pages");
Matthew Lock
  • 13,144
  • 12
  • 92
  • 130
Peter Gfader
  • 7,673
  • 8
  • 55
  • 56
4

One Line:

int pdfPageCount = System.IO.File.ReadAllText("example.pdf").Split(new string[] { "/Type /Page" }, StringSplitOptions.None).Count()-2;

Recommended: ITEXTSHARP

Medo Medo
  • 952
  • 2
  • 12
  • 21
  • Works well for my files. It is fast enough for my need but I wonder how is performance compared to the regex solution posted by @Barrett – Maxter May 09 '19 at 15:01
2

Docotic.Pdf library may be used to accomplish the task.

Here is sample code:

PdfDocument document = new PdfDocument();
document.Open("file.pdf");
int pageCount = document.PageCount;

The library will parse as little as possible so performance should be ok.

Disclaimer: I work for Bit Miracle.

Bobrovsky
  • 13,789
  • 19
  • 80
  • 130
  • I don't want to be sarcastic, but you should check your performance claim. I tried on a 250 pages PDF, 216Mo, and it was nearly 20x slower than PDF-Sharp, just to get the page count, using your example – Guillaume Jun 01 '13 at 12:49
0

I've used the code above that solves the problem using regex and it works, but it's quite slow. It reads the entire file to determine the number of pages.

I used it in a web app and pages would sometimes list 20 or 30 PDFs at a time and in that circumstance the load time for the page went from a couple seconds to almost a minute due to the page counting method.

I don't know if the 3rd party libraries are much better, I would hope that they are and I've used pdflib in other scenarios with success.

  • Ryan, I have used the iTextSharp library to solve this problem, and found it to give decent performance. You could also look at PDFSharp. As for the issues with the regex solution, it is another example of regular expressions causing more problems than they solve - http://www.codinghorror.com/blog/archives/001016.html – Tangiest Feb 03 '10 at 10:23
  • Agreed. I didn't see your note until after, but I replaced the RegEx function with one using iTextSharp as you recommend and there was a huge improvement in performance. Based on my tests the iTextSharp method is at least 5x faster than the RegEx method and usually a lot more than that, at least when I'm calculating for a number of PDF files at the same time (i.e. loading a page with multiple PDFs listed). –  Feb 16 '10 at 04:39
  • If performance is a problem, you might want to try a command line utility such as PDFLeo (http://www.rockpdf.com). A command like "pdfleo -i myfile.pdf | grep "Number of Pages" takes less than 1 second on 300 pages file. – cuteCAT Oct 30 '12 at 16:48
0

I have good success using CeTe Dynamic PDF products. They're not free, but are well documented. They did the job for me.

http://www.dynamicpdf.com/

Paul Lefebvre
  • 6,253
  • 3
  • 28
  • 36