36

I need a way to count the number of pages of a PDF in PHP. I've done a bit of Googling and the only things I've found either utilize shell/bash scripts, perl, or other languages, but I need something in native PHP. Are there any libraries or examples of how to do this?

UnkwnTech
  • 88,102
  • 65
  • 184
  • 229

11 Answers11

26

If using Linux, this is much faster than using identify to get the page count (especially with a high number of pages):

exec('/usr/bin/pdfinfo '.$tmpfname.' | awk \'/Pages/ {print $2}\'', $output);

You do need pdfinfo installed.

stephangroen
  • 1,017
  • 10
  • 19
  • 2
    wow this is 1000000 times faster than any other approach listed here. Cheers. – Iraklis Mar 08 '13 at 19:33
  • +1 For using the right way to do it! – Xethron Oct 24 '13 at 05:57
  • 2
    you may need to use `which phpinfo` to get the absolute path. Also install phpinfo on the server – frazras Jan 18 '17 at 15:43
  • 1
    `qpdf` is also an option. One advantage is you don't have to parse the ouput. `qpdf --show-npages file.pdf` returns just the _number of pages_ with a linebreak. So a `trim()/parseInt()` is all you need: `trim(shell_exec('qpdf --show-npages ' . escapeshellarg($file)))` – CodeBrauer Aug 26 '20 at 12:16
15

I know this is pretty old... but if it's relevant to me now, it can be relevant to others too.

I just worked out this method of getting page numbers, as the methods listed here are inefficient and extremely slow for large PDFs.

$im = new Imagick();
$im->pingImage('name_of_pdf_file.pdf');
echo $im->getNumberImages();

Seems to be working great for me!

user678415
  • 150
  • 1
  • 4
14

You can use the ImageMagick extension for PHP. ImageMagick understands PDF's, and you can use the identify command to extract the number of pages. The PHP function is Imagick::identifyImage().

Travis Beale
  • 5,534
  • 7
  • 34
  • 34
  • 2
    This is a quite old answer. You might want to have a look at [TCPDI](https://github.com/pauln/tcpdi). This does absolutely the same without adding an extra PHP lib `$pageCount = (new TCPDI())->setSourceData((string)file_get_contents($fileName));` – Björn Pfoster Aug 06 '19 at 11:33
  • TCPDI is also an library. – Bhavin Thummar Oct 17 '20 at 18:28
  • I use regex `preg_match('/\/Count\s?(?\d+)\s?\/Type\s*?\/Pages/', $chunk, $matches)` to count pages for Pdf v1.7, Full solution here https://rcadhikari.blogspot.com/2021/03/count-number-of-pdf-file-pages-in-php.html – rc.adhikari Mar 09 '21 at 17:36
11

I actually went with a combined approach. Since I have exec disabled on my server I wanted to stick with a PHP based solution, so ended up with this:

Code:

function getNumPagesPdf($filepath){
    $fp = @fopen(preg_replace("/\[(.*?)\]/i", "",$filepath),"r");
    $max=0;
    while(!feof($fp)) {
            $line = fgets($fp,255);
            if (preg_match('/\/Count [0-9]+/', $line, $matches)){
                    preg_match('/[0-9]+/',$matches[0], $matches2);
                    if ($max<$matches2[0]) $max=$matches2[0];
            }
    }
    fclose($fp);
    if($max==0){
        $im = new imagick($filepath);
        $max=$im->getNumberImages();
    }

    return $max;
}

If it can't figure things out because there are no Count tags, then it uses the imagick php extension. The reason I do a two-fold approach is because the latter is quite slow.

sth
  • 222,467
  • 53
  • 283
  • 367
adrianbj
  • 119
  • 1
  • 2
  • This is an inherent dangerous approach that is sure to fail on a significant number of PDF files out there. There's a reason the other methods are slower - they do more work and are thus more reliable. – David van Driessche Aug 16 '13 at 07:03
9

You could try fpdi (see here), as you can see when setting the sourcefile you get back the page numbers.

gen_Eric
  • 223,194
  • 41
  • 299
  • 337
lothar42
  • 458
  • 4
  • 6
  • I tested it on both of my test servers (1 Win & 1 Debian ) worked great on both, so I'll accept it. – UnkwnTech Jul 17 '09 at 15:36
  • 1
    I tried some pdf's with this but ImageMagick seems more reliable.. With many pdf's I get: FPDF error: This document (test_1.pdf) probably uses a compression technique which is not supported by the free parser shipped with FPDI. – Chris Jun 18 '10 at 06:47
  • I have the same error message as @Chris with FPDI. Some of the PDFs have been generated with Adobe Pro 8/9. – neoneye Sep 01 '10 at 14:45
3

Try this :

<?php
if (!$fp = @fopen($_REQUEST['file'],"r")) {
        echo 'failed opening file '.$_REQUEST['file'];
}
else {
        $max=0;
        while(!feof($fp)) {
                $line = fgets($fp,255);
                if (preg_match('/\/Count [0-9]+/', $line, $matches)){
                        preg_match('/[0-9]+/',$matches[0], $matches2);
                        if ($max<$matches2[0]) $max=$matches2[0];
                }
        }
        fclose($fp);
echo 'There '.($max<2?'is ':'are ').$max.' page'.($max<2?'':'s').' in '. $_REQUEST['file'].'.';
}
?>

The Count tag shows the number of pages in the different nodes. The parent node has the sum of the others in its Count tag, so this script just looks for the max (that is the number of pages).

Baboum
  • 31
  • 1
2

this one does not use imagick:

function getNumPagesInPDF($file) 
{
    //http://www.hotscripts.com/forums/php/23533-how-now-get-number-pages-one-document-pdf.html
    if(!file_exists($file))return null;
    if (!$fp = @fopen($file,"r"))return null;
    $max=0;
    while(!feof($fp)) {
            $line = fgets($fp,255);
            if (preg_match('/\/Count [0-9]+/', $line, $matches)){
                    preg_match('/[0-9]+/',$matches[0], $matches2);
                    if ($max<$matches2[0]) $max=$matches2[0];
            }
    }
    fclose($fp);
    return (int)$max;

}
2
function getNumPagesPdf($filepath) {
    $fp = @fopen(preg_replace("/\[(.*?)\]/i", "", $filepath), "r");
    $max = 0;
    if (!$fp) {
        return "Could not open file: $filepath";
    } else {
        while (!@feof($fp)) {
            $line = @fgets($fp, 255);
            if (preg_match('/\/Count [0-9]+/', $line, $matches)) {
                preg_match('/[0-9]+/', $matches[0], $matches2);
                if ($max < $matches2[0]) {
                    $max = trim($matches2[0]);
                    break;
                }
            }
        }
        @fclose($fp);
    }

    return $max;
}

This does exactly what i want:

I just worked out this method of getting pdf page numbers... after getting the pdf page count i just add break to the while so that it does not go in infinite loop here....

stev
  • 29
  • 1
1

On *nix environment you can use:

exec('pdftops ' . $filename . ' - | grep showpage | wc -l', $output);

Where pdftops should be installed as default.

Or as Xethron suggested:

pdfinfo filename.pdf | grep Pages: | awk '{print $2}'
kenorb
  • 155,785
  • 88
  • 678
  • 743
  • 1
    -1, your answer doesn't qualify for the question, "Count the number of pages in a PDF in only PHP". Note the "in only PHP" part. ;) Also you answer is highly system dependant, both on *nix and pdftops being installed. – UnkwnTech Oct 14 '12 at 03:16
  • 1
    Extremely slow! `pdfinfo filename.pdf | grep Pages: | awk '{print $2}'` is a much better solution! – Xethron Oct 24 '13 at 05:56
0
$pdftext = file_get_contents($caminho1);

 $num_pag = preg_match_all("/\/Page\W/", $pdftext,$dummy);
Murilo
  • 11
0

Using only PHP can result in installing complicated libraries, restarting Apache etc. and many pure PHP-ways (like opening streams and using regex) are inaccurate.

The included answer is the only fast and reliable way I can think of. It uses a single executable though that doesn't have to be installed (either *nix or Windows) and a simple PHP script extracts the output. The best thing is that I haven't seen a wrong pagecount yet!

It can be found here, including why the other approaches "don't work":

Get the number of pages in a PDF document

Community
  • 1
  • 1
Richard de Wit
  • 7,102
  • 7
  • 44
  • 54