12

Is there a way in iOS to merge PDF files, that is, append the pages of one at the end of another and save it to disk?

bummi
  • 27,123
  • 14
  • 62
  • 101
lkraider
  • 4,111
  • 3
  • 28
  • 31
  • This is not an answer for this, but If someone want to [append an existing pdf file](http://stackoverflow.com/a/15355168/1603234) – Hemang Mar 12 '13 at 07:01
  • I believe [FastPdfKit](https://github.com/mobfarm/FastPdfKit) is exactly what you are looking for! – Icemanind Jul 01 '11 at 22:02
  • FastPdfKit is a nice library to display pdf, but it doesn't support merging AFAICT. – lkraider Jul 03 '11 at 17:30

7 Answers7

23

I've made a little refactor on Jonathan's code to join any PDF file of any size:

+ (NSString *)joinPDF:(NSArray *)listOfPaths {
    // File paths
    NSString *fileName = @"ALL.pdf";
    NSString *pdfPathOutput = [[NSSearchPathForDirectoriesInDomains(NSDocumentDirectory, NSUserDomainMask, YES) objectAtIndex:0] stringByAppendingPathComponent:fileName];

    CFURLRef pdfURLOutput = (  CFURLRef)CFBridgingRetain([NSURL fileURLWithPath:pdfPathOutput]);

    NSInteger numberOfPages = 0;
    // Create the output context
    CGContextRef writeContext = CGPDFContextCreateWithURL(pdfURLOutput, NULL, NULL);

    for (NSString *source in listOfPaths) {
        CFURLRef pdfURL = (  CFURLRef)CFBridgingRetain([[NSURL alloc] initFileURLWithPath:source]);

        //file ref
        CGPDFDocumentRef pdfRef = CGPDFDocumentCreateWithURL((CFURLRef) pdfURL);
        numberOfPages = CGPDFDocumentGetNumberOfPages(pdfRef);

        // Loop variables
        CGPDFPageRef page;
        CGRect mediaBox;

        // Read the first PDF and generate the output pages
        DLog(@"GENERATING PAGES FROM PDF 1 (%@)...", source);
        for (int i=1; i<=numberOfPages; i++) {
            page = CGPDFDocumentGetPage(pdfRef, i);
            mediaBox = CGPDFPageGetBoxRect(page, kCGPDFMediaBox);
            CGContextBeginPage(writeContext, &mediaBox);
            CGContextDrawPDFPage(writeContext, page);
            CGContextEndPage(writeContext);
        }

        CGPDFDocumentRelease(pdfRef);
        CFRelease(pdfURL);
    }
    CFRelease(pdfURLOutput);

    // Finalize the output file
    CGPDFContextClose(writeContext);
    CGContextRelease(writeContext);

    return pdfPathOutput;
}

Hope that helps

Jeffery Thomas
  • 42,202
  • 8
  • 92
  • 117
matsoftware
  • 766
  • 6
  • 12
  • 2
    just an update for ARC: It seems that this will throw an exception unless you use the autoreleasing method fileURLWithPath: --> (__bridge CFURLRef)[NSURL fileURLWithPath:pdfPathOutput] instead of the init method. – Bek Nov 09 '12 at 13:59
  • 1
    The original pdf is kinda different in terms of size and compatibility version, consider only one pdf file to merge, the original is in version 1.4 but the merged one generated in 1.3. How can I set the version? – Cam Nov 12 '12 at 22:07
21

I came out with this solution:

// Documents dir
NSArray *paths = NSSearchPathForDirectoriesInDomains(NSDocumentDirectory, NSUserDomainMask, YES);
NSString *documentsDirectory = [paths objectAtIndex:0];

// File paths
NSString *pdfPath1 = [documentsDirectory stringByAppendingPathComponent:@"1.pdf"];
NSString *pdfPath2 = [documentsDirectory stringByAppendingPathComponent:@"2.pdf"];
NSString *pdfPathOutput = [documentsDirectory stringByAppendingPathComponent:@"out.pdf"];

// File URLs
CFURLRef pdfURL1 = (CFURLRef)[[NSURL alloc] initFileURLWithPath:pdfPath1];
CFURLRef pdfURL2 = (CFURLRef)[[NSURL alloc] initFileURLWithPath:pdfPath2];
CFURLRef pdfURLOutput = (CFURLRef)[[NSURL alloc] initFileURLWithPath:pdfPathOutput];

// File references
CGPDFDocumentRef pdfRef1 = CGPDFDocumentCreateWithURL((CFURLRef) pdfURL1);
CGPDFDocumentRef pdfRef2 = CGPDFDocumentCreateWithURL((CFURLRef) pdfURL2);

// Number of pages
NSInteger numberOfPages1 = CGPDFDocumentGetNumberOfPages(pdfRef1);
NSInteger numberOfPages2 = CGPDFDocumentGetNumberOfPages(pdfRef2);

// Create the output context
CGContextRef writeContext = CGPDFContextCreateWithURL(pdfURLOutput, NULL, NULL);

// Loop variables
CGPDFPageRef page;
CGRect mediaBox;

// Read the first PDF and generate the output pages
NSLog(@"GENERATING PAGES FROM PDF 1 (%i)...", numberOfPages1);
for (int i=1; i<=numberOfPages1; i++) {
    page = CGPDFDocumentGetPage(pdfRef1, i);
    mediaBox = CGPDFPageGetBoxRect(page, kCGPDFMediaBox);
    CGContextBeginPage(writeContext, &mediaBox);
    CGContextDrawPDFPage(writeContext, page);
    CGContextEndPage(writeContext);
}

// Read the second PDF and generate the output pages
NSLog(@"GENERATING PAGES FROM PDF 2 (%i)...", numberOfPages2);
for (int i=1; i<=numberOfPages2; i++) {
    page = CGPDFDocumentGetPage(pdfRef2, i);
    mediaBox = CGPDFPageGetBoxRect(page, kCGPDFMediaBox);
    CGContextBeginPage(writeContext, &mediaBox);
    CGContextDrawPDFPage(writeContext, page);
    CGContextEndPage(writeContext);      
}
NSLog(@"DONE!");

// Finalize the output file
CGPDFContextClose(writeContext);

// Release from memory
CFRelease(pdfURL1);
CFRelease(pdfURL2);
CFRelease(pdfURLOutput);
CGPDFDocumentRelease(pdfRef1);
CGPDFDocumentRelease(pdfRef2);
CGContextRelease(writeContext);

The biggest issue here is memory allocation. As you can see, in this approach you have to read both PDF files you want to merge with and, at the same time, generate the output. The releases only occur at the end. I tried combining a PDF file with 500 pages (~15MB) with another containing 100 pages (~3MB) and it produced a new one with 600 pages (of course!) having only ~5MB size (magic?). The execution took around 30 seconds (not so bad, considering an iPad 1) and allocated 17MB (ouch!). The app luckily didn't crash, but I think iOS would love to kill an app consuming 17MB like this one. ;P

Jonatan
  • 268
  • 2
  • 5
  • Shouldn't be that hard to modify the code so that it releases each document (or even every x-pages or so), if you're writing each page to disk at a time…heck, you could ever read in and release a page at a time! It's a tradeoff between speed and memory… – FeifanZ Jul 04 '11 at 20:37
  • That would work! Although I couldn't find a way to implement that, yet. After you close a context (save the file to disk) using `CGPDFContextClose`, it's not possible to reopen it and continue editing from where you stopped, like moving the pointer to the end of file and add new content. – Jonatan Jul 04 '11 at 21:40
  • Hmmm…I'm not too familiar with the functions and methods down there, but there might be some kind of file I/O stream class that would let you advance the pointer. Or just pull out something like a megabyte of bits at a time and write it to disk (not sure how well that would work though) – FeifanZ Jul 05 '11 at 02:47
  • This is perfect for me, I am only merging 4-5 pages at a shot. – PruitIgoe Mar 25 '13 at 18:40
  • Thank you for a great answer! I've been looking all over for a good example of merging differently sized pages from several pdf files and this one takes the cake! One caveat is the way the page dimensions are determined. You use `CGPDFPageGetBoxRect(page, kCGPDFMediaBox)` but the parameter `kCGPDFMediaBox` may not always be the best choice. In a file that was used to print a magazine, the fact that the front and back covers are printed on a single sheet of paper meant that with `kCGPDFMediaBox` I would get both on the same page, while `kCGPDFCropBox` crops the visible pages correctly. – SaltyNuts Mar 24 '14 at 14:35
6

My function in swift 3:

// sourcePdfFiles is array of source file full paths, destPdfFile is dest file full path
func mergePdfFiles(sourcePdfFiles:[String], destPdfFile:String) {

    guard UIGraphicsBeginPDFContextToFile(destPdfFile, CGRect.zero, nil) else {
        return
    }
    guard let destContext = UIGraphicsGetCurrentContext() else {
        return
    }

    for index in 0 ..< sourcePdfFiles.count {
        let pdfFile = sourcePdfFiles[index]
        let pdfUrl = NSURL(fileURLWithPath: pdfFile)
        guard let pdfRef = CGPDFDocument(pdfUrl) else {
            continue
        }

        for i in 1 ... pdfRef.numberOfPages {
            if let page = pdfRef.page(at: i) {
                var mediaBox = page.getBoxRect(.mediaBox)
                destContext.beginPage(mediaBox: &mediaBox)
                destContext.drawPDFPage(page)
                destContext.endPage()
            }
        }
    }

    destContext.closePDF()
    UIGraphicsEndPDFContext()
}
Sam Xu
  • 252
  • 3
  • 5
3

I thought I'd share the answer using Swift since I was looking for it in Swift and couldn't find it and had to translate it. Also, my answer uses an array of each of the separate pdfs pdfPagesURLArray and loops through to generate the complete pdf. I'm fairly new at this so any suggestions are welcome.

    let file = "fileName.pdf"
    guard var documentPaths = NSSearchPathForDirectoriesInDomains(.DocumentDirectory, .UserDomainMask, true).first else {
        NSLog("Doh - can't find that path")
        return
    }
    documentPaths = documentPaths.stringByAppendingString(file)
    print(documentPaths)

    let fullPDFOutput: CFURLRef = NSURL(fileURLWithPath: documentPaths)

    let writeContext = CGPDFContextCreateWithURL(fullPDFOutput, nil, nil)

    for pdfURL in pdfPagesURLArray {
        let pdfPath: CFURLRef = NSURL(fileURLWithPath: pdfURL)
        let pdfReference = CGPDFDocumentCreateWithURL(pdfPath)
        let numberOfPages = CGPDFDocumentGetNumberOfPages(pdfReference)
        var page: CGPDFPageRef
        var mediaBox: CGRect

        for index in 1...numberOfPages {

Could do force unwrapping here like this: page = CGPDFDocumentGetPage(pdfReference, index)! But to continue with best practice:

        guard let getCGPDFPage = CGPDFDocumentGetPage(pdfReference, index) else {
                NSLog("Error occurred in creating page")
                return
            }
            page = getCGPDFPage
            mediaBox = CGPDFPageGetBoxRect(page, .MediaBox)
            CGContextBeginPage(writeContext, &mediaBox)
            CGContextDrawPDFPage(writeContext, page)
            CGContextEndPage(writeContext)
        }
    }
    NSLog("DONE!")

    CGPDFContextClose(writeContext);

    NSLog(documentPaths)
FromTheStix
  • 429
  • 5
  • 10
  • Thanks a bunch, I am creating pdf from html and I needed page breaks, Since I was not able to find any, So I created multiple pdf with multiple html and then used ur code to combine them all, thanks a lot :) – vinbhai4u Feb 18 '16 at 06:13
  • 1
    @vinbhai4u I couldn't find any solutions in swift so I was hoping what I came up with would be helpful to others. Glad it helped you :) – FromTheStix Feb 20 '16 at 03:07
3

Swift 5:

import PDFKit

Merge pdfs like this to keep links, etc...

func mergePdf(data: Data, otherPdfDocumentData: Data) -> PDFDocument {
    // get the pdfData
    let pdfDocument = PDFDocument(data: data)!
    let otherPdfDocument = PDFDocument(data: otherPdfDocumentData)!
    
    // create new PDFDocument
    let newPdfDocument = PDFDocument()

    // insert all pages of first document
    for p in 0..<pdfDocument.pageCount {
        let page = pdfDocument.page(at: p)!
        let copiedPage = page.copy() as! PDFPage // from docs
        newPdfDocument.insert(copiedPage, at: newPdfDocument.pageCount)
    }

    // insert all pages of other document
    for q in 0..<otherPdfDocument.pageCount {
        let page = pdfDocument.page(at: q)!
        let copiedPage = page.copy() as! PDFPage
        newPdfDocument.insert(copiedPage, at: newPdfDocument.pageCount)
    }
    return newPdfDocument
}

The insert function of PDFs can be found in the doc, where it says:

open class PDFDocument : NSObject, NSCopying {
...
// Methods allowing pages to be inserted, removed, and re-ordered. Can throw range exceptions.
// Note: when inserting a PDFPage, you have to be careful if that page came from another PDFDocument. PDFPage's have a 
// notion of a single document that owns them and when you call the methods below the PDFPage passed in is assigned a 
// new owning document.  You'll want to call -[PDFPage copy] first then and pass this copy to the blow methods. This 
// allows the orignal PDFPage to maintain its original document.
open func insert(_ page: PDFPage, at index: Int)

open func removePage(at index: Int)

open func exchangePage(at indexA: Int, withPageAt indexB: Int)
...
}

Create your mergedPdf as variable of your class:

    var mergedPdf: PDFDocument?

On viewDidLoad() call the merge function and show your merged PDF:

    mergedPdf = mergePdf(data: pdfData1, otherPdfDocumentData: pdfData2)
    // show merged pdf in pdfView
    PDFView.document = mergedPdf!
    

Save your pdf.

First convert the newPDF like this:

let documentDataForSaving = mergedPdf.dataRepresentation()

Put the documentDataForSaving in this following function: Use the save function:

let urlWhereTheFileIsSaved = writeDataToTemporaryDirectory(withFilename: "My File Name", inFolder: nil, data: documentDataForSaving)

Probably you want to avoid / in file name and don't make it longer than 256 characters

Save-Function:

// Export PDF to directory, e.g. here for sharing
    func writeDataToTemporaryDirectory(withFilename: String, inFolder: String?, data: Data) -> URL? {
        do {
            // get a directory
            var temporaryDirectory = FileManager.default.temporaryDirectory // for e.g. sharing
            // FileManager.default.urls(for: .documentDirectory, in: .userDomainMask).last! // to make it public in user's directory (update plist for user access) 
            // FileManager.default.urls(for: .libraryDirectory, in: .userDomainMask).last! // to hide it from any user interactions) 
            // do you want to create subfolder?
            if let inFolder = inFolder {
                temporaryDirectory = temporaryDirectory.appendingPathComponent(inFolder)
                if !FileManager.default.fileExists(atPath: temporaryDirectory.absoluteString) {
                    do {
                        try FileManager.default.createDirectory(at: temporaryDirectory, withIntermediateDirectories: true, attributes: nil)
                    } catch {
                        print(error.localizedDescription);
                    }
                }
            }
            // name the file
            let temporaryFileURL = temporaryDirectory.appendingPathComponent(withFilename)
            print("writeDataToTemporaryDirectory at url:\t\(temporaryFileURL)")
            try data.write(to: temporaryFileURL)
            return temporaryFileURL
        } catch {
            print(error)
        }
        return nil
    }
FrugalResolution
  • 568
  • 4
  • 18
  • I'm trying to get this to work, and it seems something changed. The `insert` method on a `PDFDocument` expects an object of type `PDFPage`, *not* another `PDFDocument`. Have you tried this in Xcode 13.3 by chance? – Clifton Labrum May 05 '22 at 00:32
  • Thanks for mentioning that! Not really, I haven’t opened my project for a while. I will check it and give an update asap. – FrugalResolution May 06 '22 at 01:08
  • @CliftonLabrum I've added a copy function for the PDFPage to consider the documentation, but indeed, `insert` is a function of `PDFDocument` and needs a `PDFPage`, I copy&pasted my code and it works like a charm, do you sill have any trouble? – FrugalResolution May 22 '22 at 14:10
  • 1
    No, I'm good. I figured it out. Thanks! – Clifton Labrum May 23 '22 at 17:51
1

I am promoting my own library here...but I have a free PDF reading/writing library, which i recently showed how to use in iOS context. it's perfect for merging PDFs and manipulating them, and do that with a relatively small memory signature. Consider using it, see here an example - ios with PDFHummus. Again, it's me promoting my own library, so do take this advice in the right context.

Gal Kahana
  • 59
  • 5
1

I based my solution on the solution created by @matsoftware.

I created a snippet for my solution: https://gist.github.com/jefferythomas/7265536

+ (void)combinePDFURLs:(NSArray *)PDFURLs writeToURL:(NSURL *)URL
{
    CGContextRef context = CGPDFContextCreateWithURL((__bridge CFURLRef)URL, NULL, NULL);

    for (NSURL *PDFURL in PDFURLs) {
        CGPDFDocumentRef document = CGPDFDocumentCreateWithURL((__bridge CFURLRef)PDFURL);
        size_t numberOfPages = CGPDFDocumentGetNumberOfPages(document);

        for (size_t pageNumber = 1; pageNumber <= numberOfPages; ++pageNumber) {
            CGPDFPageRef page = CGPDFDocumentGetPage(document, pageNumber);
            CGRect mediaBox = CGPDFPageGetBoxRect(page, kCGPDFMediaBox);

            CGContextBeginPage(context, &mediaBox);
            CGContextDrawPDFPage(context, page);
            CGContextEndPage(context);
        }

        CGPDFDocumentRelease(document);
    }

    CGPDFContextClose(context);
    CGContextRelease(context);
}
Jeffery Thomas
  • 42,202
  • 8
  • 92
  • 117