Count the number of pages in a PDF in only PHP

Question

I need a way to count the number of pages of a PDF in PHP. I've done a bit of Googling and the only things I've found either utilize shell/bash scripts, perl, or other languages, but I need something in native PHP. Are there any libraries or examples of how to do this?

https://github.com/howtomakeaturn/pdfinfo – Ostap Brehin Oct 09 '20 at 20:51 — Ostap Brehin, Oct 09 '20 at 20:51

score 26 · Answer 1 · answered Jul 03 '12 at 10:00

26

If using Linux, this is much faster than using identify to get the page count (especially with a high number of pages):

exec('/usr/bin/pdfinfo '.$tmpfname.' | awk \'/Pages/ {print $2}\'', $output);

You do need pdfinfo installed.

answered Jul 03 '12 at 10:00

stephangroen

1,017
10
19

2

wow this is 1000000 times faster than any other approach listed here. Cheers. – Iraklis Mar 08 '13 at 19:33
+1 For using the right way to do it! – Xethron Oct 24 '13 at 05:57
2

you may need to use `which phpinfo` to get the absolute path. Also install phpinfo on the server – frazras Jan 18 '17 at 15:43
1

`qpdf` is also an option. One advantage is you don't have to parse the ouput. `qpdf --show-npages file.pdf` returns just the _number of pages_ with a linebreak. So a `trim()/parseInt()` is all you need: `trim(shell_exec('qpdf --show-npages ' . escapeshellarg($file)))` – CodeBrauer Aug 26 '20 at 12:16

score 15 · Answer 2 · answered Mar 10 '12 at 00:16

I know this is pretty old... but if it's relevant to me now, it can be relevant to others too.

I just worked out this method of getting page numbers, as the methods listed here are inefficient and extremely slow for large PDFs.

$im = new Imagick();
$im->pingImage('name_of_pdf_file.pdf');
echo $im->getNumberImages();

Seems to be working great for me!

score 14 · Accepted Answer · answered Jul 17 '09 at 15:25

14

You can use the ImageMagick extension for PHP. ImageMagick understands PDF's, and you can use the identify command to extract the number of pages. The PHP function is Imagick::identifyImage().

answered Jul 17 '09 at 15:25

Travis Beale

5,534
7
34
34

2

This is a quite old answer. You might want to have a look at [TCPDI](https://github.com/pauln/tcpdi). This does absolutely the same without adding an extra PHP lib `$pageCount = (new TCPDI())->setSourceData((string)file_get_contents($fileName));` – Björn Pfoster Aug 06 '19 at 11:33
TCPDI is also an library. – Bhavin Thummar Oct 17 '20 at 18:28
I use regex `preg_match('/\/Count\s?(?\d+)\s?\/Type\s*?\/Pages/', $chunk, $matches)` to count pages for Pdf v1.7, Full solution here https://rcadhikari.blogspot.com/2021/03/count-number-of-pdf-file-pages-in-php.html – rc.adhikari Mar 09 '21 at 17:36

score 11 · Answer 4 · edited Feb 23 '10 at 19:39

I actually went with a combined approach. Since I have exec disabled on my server I wanted to stick with a PHP based solution, so ended up with this:

Code:

function getNumPagesPdf($filepath){
    $fp = @fopen(preg_replace("/\[(.*?)\]/i", "",$filepath),"r");
    $max=0;
    while(!feof($fp)) {
            $line = fgets($fp,255);
            if (preg_match('/\/Count [0-9]+/', $line, $matches)){
                    preg_match('/[0-9]+/',$matches[0], $matches2);
                    if ($max<$matches2[0]) $max=$matches2[0];
            }
    }
    fclose($fp);
    if($max==0){
        $im = new imagick($filepath);
        $max=$im->getNumberImages();
    }

    return $max;
}

If it can't figure things out because there are no Count tags, then it uses the imagick php extension. The reason I do a two-fold approach is because the latter is quite slow.

This is an inherent dangerous approach that is sure to fail on a significant number of PDF files out there. There's a reason the other methods are slower - they do more work and are thus more reliable. — David van Driessche, Aug 16 '13 at 07:03

score 9 · Answer 5 · edited Feb 09 '12 at 14:22

9

You could try fpdi (see here), as you can see when setting the sourcefile you get back the page numbers.

edited Feb 09 '12 at 14:22

gen_Eric

223,194
41
299
337

answered Jul 17 '09 at 15:09

lothar42

458
4
6

I tested it on both of my test servers (1 Win & 1 Debian ) worked great on both, so I'll accept it. – UnkwnTech Jul 17 '09 at 15:36
1

I tried some pdf's with this but ImageMagick seems more reliable.. With many pdf's I get: FPDF error: This document (test_1.pdf) probably uses a compression technique which is not supported by the free parser shipped with FPDI. – Chris Jun 18 '10 at 06:47
I have the same error message as @Chris with FPDI. Some of the PDFs have been generated with Adobe Pro 8/9. – neoneye Sep 01 '10 at 14:45

score 3 · Answer 6 · answered Feb 08 '10 at 19:36

Try this :

<?php
if (!$fp = @fopen($_REQUEST['file'],"r")) {
        echo 'failed opening file '.$_REQUEST['file'];
}
else {
        $max=0;
        while(!feof($fp)) {
                $line = fgets($fp,255);
                if (preg_match('/\/Count [0-9]+/', $line, $matches)){
                        preg_match('/[0-9]+/',$matches[0], $matches2);
                        if ($max<$matches2[0]) $max=$matches2[0];
                }
        }
        fclose($fp);
echo 'There '.($max<2?'is ':'are ').$max.' page'.($max<2?'':'s').' in '. $_REQUEST['file'].'.';
}
?>

The Count tag shows the number of pages in the different nodes. The parent node has the sum of the others in its Count tag, so this script just looks for the max (that is the number of pages).

score 2 · Answer 7 · answered Feb 09 '12 at 14:02

this one does not use imagick:

function getNumPagesInPDF($file) 
{
    //http://www.hotscripts.com/forums/php/23533-how-now-get-number-pages-one-document-pdf.html
    if(!file_exists($file))return null;
    if (!$fp = @fopen($file,"r"))return null;
    $max=0;
    while(!feof($fp)) {
            $line = fgets($fp,255);
            if (preg_match('/\/Count [0-9]+/', $line, $matches)){
                    preg_match('/[0-9]+/',$matches[0], $matches2);
                    if ($max<$matches2[0]) $max=$matches2[0];
            }
    }
    fclose($fp);
    return (int)$max;

}

score 2 · Answer 8 · answered Mar 16 '12 at 07:27

function getNumPagesPdf($filepath) {
    $fp = @fopen(preg_replace("/\[(.*?)\]/i", "", $filepath), "r");
    $max = 0;
    if (!$fp) {
        return "Could not open file: $filepath";
    } else {
        while (!@feof($fp)) {
            $line = @fgets($fp, 255);
            if (preg_match('/\/Count [0-9]+/', $line, $matches)) {
                preg_match('/[0-9]+/', $matches[0], $matches2);
                if ($max < $matches2[0]) {
                    $max = trim($matches2[0]);
                    break;
                }
            }
        }
        @fclose($fp);
    }

    return $max;
}

This does exactly what i want:

I just worked out this method of getting pdf page numbers... after getting the pdf page count i just add break to the while so that it does not go in infinite loop here....

kenorb · Answer 9 · 2013-10-24T11:20:49.193

1

On *nix environment you can use:

exec('pdftops ' . $filename . ' - | grep showpage | wc -l', $output);

Where pdftops should be installed as default.

Or as Xethron suggested:

pdfinfo filename.pdf | grep Pages: | awk '{print $2}'

edited Oct 24 '13 at 11:20

answered Oct 13 '12 at 15:00

kenorb

155,785
88
678
743

1

-1, your answer doesn't qualify for the question, "Count the number of pages in a PDF in only PHP". Note the "in only PHP" part. ;) Also you answer is highly system dependant, both on *nix and pdftops being installed. – UnkwnTech Oct 14 '12 at 03:16
1

Extremely slow! `pdfinfo filename.pdf | grep Pages: | awk '{print $2}'` is a much better solution! – Xethron Oct 24 '13 at 05:56

score 0 · Answer 10 · answered Jul 06 '12 at 20:10

0

$pdftext = file_get_contents($caminho1);

 $num_pag = preg_match_all("/\/Page\W/", $pdftext,$dummy);

answered Jul 06 '12 at 20:10

Murilo

11

1

I'm sure it's easy to get a wrong count for PDFs containing "/Page " – Niko Sams Oct 10 '12 at 20:21

score 0 · Answer 11 · edited May 23 '17 at 10:29

Using only PHP can result in installing complicated libraries, restarting Apache etc. and many pure PHP-ways (like opening streams and using regex) are inaccurate.

The included answer is the only fast and reliable way I can think of. It uses a single executable though that doesn't have to be installed (either *nix or Windows) and a simple PHP script extracts the output. The best thing is that I haven't seen a wrong pagecount yet!

It can be found here, including why the other approaches "don't work":

Get the number of pages in a PDF document

Count the number of pages in a PDF in only PHP

11 Answers11

Linked