I am trying to split large files into individual pages, using PHP's FPDI library.
For some reason, splitting the file does not do much to reduce the file size. For example, the following script applied to a 30 page 1MB file results in 30 files of around 0.9MB, i.e. resulting in total of around 26MB!
It suggests to me that a big portion of original file is retained, even though it is not required.
Questions:
- Is this avoidable?
- Is this a bug in FPDI?
- Is there an alternative PHP library that is more efficient at splitting?
More detail
I've reproduced this issue in a variety of configurations:
- FPDI version 1 (no longer supported) and FPDI version 2
- Using FPDF and TCPDF
- PHP 5.4 and PHP 5.6
- Various PDF files, including files generated using FPDF and TCPDF
Here is some PHP code to illustrate the issue:
<?php
testPdfSplit();
function testPdfSplit()
{
echo phpversion();
//Load a file
$contentPath = "/path/to/local/files/original_file.pdf";
copy("https://file-examples.com/wp-content/uploads/2017/10/file-example_PDF_1MB.pdf", $contentPath);
$numpages = 30;
//Get the original file size
$fileSize = round(filesize($contentPath) / (1024 * 1024), 3);
echo "<p>Original file is $fileSize MB</p>";
for($i=1; $i<=$numpages; $i++)
{
echo "<p>Creating file with $i pages</p>";
$filePath = "/path/to/local/files/test.$i.pdf";
try
{
selectOnePage($content, $i, $filePath);
}
catch (Exception $e)
{
die ("<pre>ERROR: $e</pre>");
}
$fileSize = round(filesize($filePath) / (1024 * 1024),3);
echo "<p>$filePath is $fileSize MB</p>";
}
}
function selectOnePage($filePathIn, $pageNo, $filePathOut)
{
require_once('fpdf/fpdf.php');
require_once('fpdi/src/autoload.php');
// initiate FPDI
$pdf = new \setasign\Fpdi\Fpdi();
// get the page count
$pageCount = $pdf->setSourceFile($filePathIn);
echo "<p>Selecting page $pageNo / $pageCount</p>";
// import a page
$pdf->AddPage();
$templateId = $pdf->importPage($pageNo);
$pdf->useImportedPage($templateId);
//output the file
$pdf->Output($filePathOut, 'F');
}