-1

I want this functionality in my PHP application:

When user upload a document (PDF, DOCX, DOC, PPT, PPTC extensions) then after uploading user get the total number of pages of document.

But without using exec() function.

halfer
  • 19,824
  • 17
  • 99
  • 186
gigasingh
  • 13
  • 1
  • 2

2 Answers2

5

It is possible to do some formats right in PHP. The DOCx and PPTx are easy:

For Word files:

function PageCount_DOCX($file) {
    $pageCount = 0;

    $zip = new ZipArchive();

    if($zip->open($file) === true) {
        if(($index = $zip->locateName('docProps/app.xml')) !== false)  {
            $data = $zip->getFromIndex($index);
            $zip->close();
            $xml = new SimpleXMLElement($data);
            $pageCount = $xml->Pages;
        }
        $zip->close();
    }

    return $pageCount;
}

and for PowerPoint

function PageCount_PPTX($file) {
    $pageCount = 0;

    $zip = new ZipArchive();

    if($zip->open($file) === true) {
        if(($index = $zip->locateName('docProps/app.xml')) !== false)  {
            $data = $zip->getFromIndex($index);
            $zip->close();
            $xml = new SimpleXMLElement($data);
            print_r($xml);
            $pageCount = $xml->Slides;
        }
        $zip->close();
    }

    return $pageCount;
}

Older Office documents are a different story. You'll find some discussion about doing that here: How to get the number of pages in a Word Document on linux?

As for PDF files, I prefer to use FPDI, even though it requires a license to parse newer PDF file formats. You can use do it simply like this:

function PageCount_PDF($file) {
    $pageCount = 0;
    if (file_exists($file)) {
        require_once('fpdf/fpdf.php');
        require_once('fpdi/fpdi.php');
        $pdf = new FPDI();                              // initiate FPDI
        $pageCount = $pdf->setSourceFile($file);        // get the page count
    }
    return $pageCount;
}
Community
  • 1
  • 1
Whiteflash
  • 66
  • 1
  • 2
1

Unfortunately you cannot get the page count of Office files without paginating them first. This cannot be done easily without help of other applications such as MS Office, OpenOffice or others. Even worse 10 page word document created with MS Word can be open as a 11 page document in OpenOffice due to the difference in pagination. Practically for getting the total number of pages of a .doc file, the most reliable solution is to use MS Word. You can do this job through Office Automation but it is quite expensive job for computers as it requires the pagination process for the whole document. Also you need to install MS Word on the computer/server.

You can relatively easily get the total number of pages in a PDF document. The page count information is easily accessible in the PDF format. Most PDF parser/reader libraries will give you a simple API for your purpose.

  • Search for information on Dsofile.dll (http://support.microsoft.com/kb/224351) which, I'm pretty sure, lets you retrieve document properties of Office documents w/o having Office installed. You'd be looking for the number of Slides rather than pages in a PPT document. It will give you the number of pages in a Word document, but as Oleg has pointed out, that may not be accurate. Even between different computers w/ same version of Word, page count may vary due to font substitution. – Steve Rindsberg May 09 '14 at 14:57
  • On further reading, it appears that Dsofile.dll only works with the newer XML-based file formats (pptX, docX etc) if you install the Office Compatibility pack. In other words, it only understands the older PPT/DOC format. – Steve Rindsberg May 09 '14 at 15:05