1

Suppose I have one large PDF file of 36Mb in size, I'd like to split this file into several smaller files no larger than 10Mb each.

So far I have written code to check a file's size, and if larger than 10Mb split the file into two files with half of the pages from the original in the first and the other half in the second (Using the DevExpress PDF API)

I'd like it to then recursively check if each newly created file still exceeds the 10Mb limit and split these files further until they are within the limit.

However given that splitting the file by page count does not necessarily halve the file size, my issue is maintaining the order of the original document.

For example: ABC.pdf - 36Mb split into two files could produce: ABC_1.pdf - 8Mb ABC_2.pdf - 28Mb

In which case ABC_2.pdf would need to be split further while ABC_1.pdf would not.

Is it possible to keep splitting a file of arbitrary size until it meets the size requirements and maintain the original document order with this in mind?

  • Yes, it's possible. Just query the file size (stuff in the System.IO namespace will help you here) of the respective PDF file you to know whether it should be split or not, and then do a fancy `if` statement on the obtained file size. –  Sep 15 '22 at 14:25
  • With regard to mainting the original order: Just use the page number or page index (or ranges thereof in case a PDF has multiple pages) in the file names. When further splitting a PDF file, use the existing page number/index range from the PDF file to be split to calculate the page number/index ranges for the names of the newly splitted files. (It looks crude/cumbersome to put and then extract the page number/index ranges from the file names again, but it has the advantage that if your program fails midway, you are still left with PDF files with meaningful file names.) –  Sep 15 '22 at 14:32
  • Have you seen [this](https://stackoverflow.com/questions/11693019/how-can-i-split-a-pdf-file-by-file-size-using-c?rq=1) closely related question and answer? – Axel Kemper Sep 15 '22 at 14:54

1 Answers1

1

I do not have access to devexpress but i have a code sample using PDFSharp.

I Hope it helps:

public void Split(string filePath, int maxSizeInBytes)
{
    var sourceFile = PdfReader.Open(filePath, PdfDocumentOpenMode.Import);
    var targetFilesCount = 1;
    var targetFileName = Path.GetFileNameWithoutExtension(filePath);
    var targetFile = new PdfDocument();

    string targetName() => $"{targetFileName}_{targetFilesCount}.pdf";

    for (int i = 0; i < sourceFile.Pages.Count; i++)
    {
        targetFile.Pages.Add(sourceFile.Pages[i]);
        targetFile.Save(targetName());

        var targetFileSize = PdfReader.Open(targetName(), PdfDocumentOpenMode.ReadOnly).FileSize;

        if (targetFileSize > maxSizeInBytes)
        {
            targetFile.Pages.Remove(targetFile.Pages[targetFile.Pages.Count - 1]);
            targetFile.Save(targetName());

            targetFilesCount++;

            targetFile = new PdfDocument();
            targetFile.Pages.Add(sourceFile.Pages[i]);
            targetFile.Save(targetName());
        }
    }
}

Call it like this:

Split("ABC.pdf", 1000000 * 10);
Ricardo Valente
  • 581
  • 1
  • 10
  • 14