18

I am working on a UNIX system and I'd like to merge thousands of PDF files into one file in order to print it. I don't know how many pages they are in advance.

I'd like to print it double sided, such that two files will not be on the same page.

Therefore it I'd the merging file to be aligned such that every file will begin in odd page and a blank page will be added if the next place to write is an even page.

RanZilber
  • 1,840
  • 4
  • 31
  • 42

9 Answers9

13

Here's the solution I use (it's based on @Dingo's basic principle, but uses an easier approach for the PDF manipulation):

  1. Create PDF file with a single blank page

    First, create a PDF file with a single blank page somewhere (in my case, it is located at /path/to/blank.pdf). This command should work (from this thread):

    touch blank.ps && ps2pdf blank.ps blank.pdf
    
  2. Run Bash script

    Then, from the directory that contains all my PDF files, I run a little script that appends the blank.pdf file to each PDF file with an odd page number:

    #!/bin/bash
    
    for f in *.pdf; do
      let npages=$(pdfinfo "$f"|grep 'Pages:'|awk '{print $2}')
      let modulo="($npages %2)"
      if [ $modulo -eq 1 ]; then
        pdftk "$f" "/path/to/blank.pdf" output "aligned_$f"
        # or
        # pdfunite "$f" "/path/to/blank.pdf" "aligned_$f"
      else
        cp "$f" "aligned_$f"
      fi
    done
    
  3. Combine the results

    Now, all aligned_-prefixed files have even page numbers, and I can join them using

    pdftk aligned_*.pdf output result.pdf
    # or
    pdfunite aligned_*.pdf result.pdf
    

Tool info:

  • ps2pdf is in the ghostscript package in most Linux distros
  • pdfinfo, pdfunite are from the Poppler PDF rendering library (usually the package name is poppler-utils or poppler_utils)
  • pdftk is usually its own package, the pdftk package
toraritte
  • 6,300
  • 3
  • 46
  • 67
Chris Lercher
  • 37,264
  • 20
  • 99
  • 131
  • When I use [Nix](https://nixos.org/download.html), `nix-shell -p ghostscript poppler_utils` drops me in a sub-shell with all the needed commands. – toraritte Feb 08 '23 at 11:28
4

your problem can be more easily solved if you look at this from an another point of view

to obtain that, in printing, page 1 of second pdf file will be not attached to last page of first pdf file on the same sheet of paper, and, more generally, first page of subsequent pdf file will be not printed on the back of the same sheet with the last page of the precedent pdf file

you need to perform a selective addition of one blank page only to pdf files having and odd number of pages

I wrote a simple script named abbblankifneeded that you can put in a file and then copy in /usr/bin or /usr/local/bin

and then invoke in folder where you have your pdf with this syntax

for f in *.pdf; do addblankifneeded $f; done

this script adds a blank page at end to pdf files having an odd number of pages, skipping pdf files having already an even number of pages and then join together all pdf into one

requirements: pdftk, pdfinfo

NOTE: depending from your bash environment, you may need to replace sh interpreter with bash interpreter in the first line of script

#!/bin/sh
#script to add automatically blank page at the end of a pdf documents, if count of their pages is a not a module of 2 and then to join all pdfs into one
#
#  made by Dingo
#
# dokupuppylinux.co.cc
#
#http://pastebin.com/u/dingodog (my pastebin toolbox for pdf scripts)
#
filename=$1
altxlarg="`pdfinfo -box $filename| grep MediaBox | cut -d : -f2 | awk '{print $3 FS $4}'`"
echo "%PDF-1.4
%µí®û
3 0 obj
<<
/Length 0
>>
stream
endstream
endobj
4 0 obj
<<
/ProcSet [/PDF ]
/ExtGState <<
/GS1 1 0 R
>>
>>
endobj
5 0 obj
<<
/Type /Halftone
/HalftoneType 1
/HalftoneName (Default)
/Frequency 60
/Angle 45
/SpotFunction /Round
>>
endobj
1 0 obj
<<
/Type /ExtGState
/SA false
/OP false
/HT /Default
>>
endobj
2 0 obj
<<
/Type /Page
/Parent 7 0 R
/Resources 4 0 R
/Contents 3 0 R
>>
endobj
7 0 obj
<<
/Type /Pages
/Kids [2 0 R ]
/Count 1
/MediaBox [0 0 595 841]
>>
endobj
6 0 obj
<<
/Type /Catalog
/Pages 7 0 R
>>
endobj
8 0 obj
<<
/CreationDate (D:20110915222508)
/Producer (libgnomeprint Ver: 2.12.1)
>>
endobj
xref
0 9
0000000000 65535 f
0000000278 00000 n
0000000357 00000 n
0000000017 00000 n
0000000072 00000 n
0000000146 00000 n
0000000535 00000 n
0000000445 00000 n
0000000590 00000 n
trailer
<<
/Size 9
/Root 6 0 R
/Info 8 0 R
>>
startxref
688
%%EOF" | sed -e "s/595 841/$altxlarg/g">blank.pdf
pdftk blank.pdf output fixed.pdf
mv fixed.pdf blank.pdf
pages="`pdftk $filename dump_data | grep NumberOfPages | cut -d : -f2`"
if [ $(( $pages % 2 )) -eq 0 ]
    then echo "$filename has already a multiple of 2 pages ($pages ). Script will be skipped for this file" >>report.txt
    else
pdftk A=$filename B=blank.pdf cat A B output blankadded.pdf
mv blankadded.pdf $filename
pdffiles=`ls *.pdf | grep -v -e blank.pdf -e joinedtogether.pdf| xargs -n 1`;  pdftk $pdffiles cat output joinedtogether.pdf
fi
exit 0
Dingo
  • 2,619
  • 1
  • 22
  • 32
  • very nice! Nice to see an example of how to work with PDF files. I'm surprised it is so easy. Good luck to all. – shellter Mar 23 '12 at 21:50
3

You can use PDFsam:

  • gratis
  • runs on Microsoft Windows, Mac OS X and Linux
  • portable version available (at least on Windows)
  • can add a blank page after each merged document if the document has an odd number of pages

enter image description here

Franck Dernoncourt
  • 77,520
  • 72
  • 342
  • 501
2

Disclaimer: I'm the author of the tools I'm mentioning here.

sejda-console

It's a free and open source command line interface for performing pdf manipulations such as merge or split. The merge command has an option stating:

[--addBlanks] : add a blank page after each merged document if the number of pages is odd (optional)

Since you just need to print the pdf I'm assuming you don't care about the order your documents are merged. This is the command you can use:

sejda-console merge -d /path/to/pdfs_to_merge -o /outputpath/merged_file.pdf --addBlanks

It can be downloaded from the official website sejda.org.

sejda.com

This is a web application backed by Sejda and has the same functionalities mentioned above but through a web interface. You are required to upload your files so, depending on the size of your input set, it might not be the right solution for you.

If you select the merge command and upload your pdf documents you will have to flag the checkbox Add blank page if odd page number to get the desired behaviour.

Andrea Vacondio
  • 888
  • 9
  • 19
1

Here is a PowerShell version of the most popular solution using pdftk. I did this for windows but you can use PowerShell Core for other platforms.

# install pdftk server if on windows
# https://www.pdflabs.com/tools/pdftk-server/

$blank_pdf_path = ".\blank.pdf"
$input_folder = ".\input\"
$aligned_folder = ".\aligned\"
$final_output_path = ".\result.pdf"

foreach($file in (Get-ChildItem $input_folder -Filter *.pdf))
{
    # easy but might break if pdfinfo output changes
    # takes 7th line with the "Page: 2" and matches only numbers
    (pdfinfo $file.FullName)[7] -match "(\d+)" | Out-Null

    $npages = $Matches[1]
    $modulo = $npages % 2

    if($modulo -eq 1)
    {
        $output_path = Join-Path $aligned_folder $file.Name
        pdftk $file.FullName $blank_pdf_path output $output_path
    }
    else
    {
        Copy-Item $file.FullName -Destination $aligned_folder
    }
}

$aligned_pdfs = Join-Path $aligned_folder "*.pdf"
pdftk $aligned_pdfs output $final_output_path
Henry Nitz
  • 31
  • 6
0

Preparation

  1. Install Python and make sure you have the pyPDF package.
  2. Create a PDF file with a single blank in /path/to/blank.pdf (I've created blank pdf pages here).
  3. Save this as pdfmerge.py in any directory of your $PATH. (I'm not a Windows user. This is straight forward under Linux. Please let me know if you get errors / if it works.)
  4. Make pdfmerge.py executable

Every time you need it

Run uniprint.py a directory that contains only PDF files you want to merge.

pdfmerge.py

#!/usr/bin/env python
# -*- coding: utf-8 -*-

from argparse import ArgumentParser
from glob import glob
from pyPdf import PdfFileReader, PdfFileWriter

def merge(path, blank_filename, output_filename):
    blank = PdfFileReader(file(blank_filename, "rb"))
    output = PdfFileWriter()

    for pdffile in glob('*.pdf'):
        if pdffile == output_filename:
            continue
        print("Parse '%s'" % pdffile)
        document = PdfFileReader(open(pdffile, 'rb'))
        for i in range(document.getNumPages()):
            output.addPage(document.getPage(i))

        if document.getNumPages() % 2 == 1:
            output.addPage(blank.getPage(0))
            print("Add blank page to '%s' (had %i pages)" % (pdffile, document.getNumPages()))
    print("Start writing '%s'" % output_filename)
    output_stream = file(output_filename, "wb")
    output.write(output_stream)
    output_stream.close()

if __name__ == "__main__":
    parser = ArgumentParser()

    # Add more options if you like
    parser.add_argument("-o", "--output", dest="output_filename", default="merged.pdf",
                      help="write merged PDF to FILE", metavar="FILE")
    parser.add_argument("-b", "--blank", dest="blank_filename", default="blank.pdf",
                      help="path to blank PDF file", metavar="FILE")
    parser.add_argument("-p", "--path", dest="path", default=".",
                      help="path of source PDF files")

    args = parser.parse_args()
    merge(args.path, args.blank_filename, args.output_filename)

Testing

Please make a comment if this works on Windows and Mac.

Please always leave a comment if it doesn't work / it could be improved.

It works on Linux. Joining 3 PDFs to a single 200-page PDF took less then a second.

Martin Thoma
  • 124,992
  • 159
  • 614
  • 958
0

Martin had a good start. I updated to PyPdf2 and made a few tweaks like sorting the output by filename.

#!/usr/bin/env python
# -*- coding: utf-8 -*-

from argparse import ArgumentParser
from glob import glob
from PyPDF2 import PdfFileReader, PdfFileWriter
import os.path
def merge(pdfpath, blank_filename, output_filename):

    with open(blank_filename, "rb") as f:
        blank = PdfFileReader(f)
        output = PdfFileWriter()

        filelist = sorted(glob(os.path.join(pdfpath,'*.pdf')))

        for pdffile in filelist:
            if pdffile == output_filename:
                continue
            print("Parse '%s'" % pdffile)

            document = PdfFileReader(open(pdffile, 'rb'))

            for i in range(document.getNumPages()):
                output.addPage(document.getPage(i))

            if document.getNumPages() % 2 == 1:
                output.addPage(blank.getPage(0))

            print("Add blank page to '%s' (had %i pages)" % (pdffile, document.getNumPages()))

        print("Start writing '%s'" % output_filename)
        with open(output_filename, "wb") as output_stream:
            output.write(output_stream)


if __name__ == "__main__":
    parser = ArgumentParser()

    # Add more options if you like
    parser.add_argument("-o", "--output", dest="output_filename", default="merged.pdf",
                      help="write merged PDF to FILE", metavar="FILE")
    parser.add_argument("-b", "--blank", dest="blank_filename", default="blank.pdf",
                      help="path to blank PDF file", metavar="FILE")
    parser.add_argument("-p", "--path", dest="path", default=".",
                      help="path of source PDF files")

    args = parser.parse_args()
    merge(args.path, args.blank_filename, args.output_filename)
`
amarchiori
  • 1,329
  • 1
  • 8
  • 3
0

The code by @Chris Lercher in https://stackoverflow.com/a/12761103/1369181 did not quite work for me. I do not know whether that is because I am working on Cygwin/mintty. Also, I have to use qpdf instead of pdftk. Here is the code that has worked for me:

#!/bin/bash

for f in *.pdf; do
  npages=$(pdfinfo "$f"|grep 'Pages:'|sed 's/[^0-9]*//g')
  modulo=$(($npages %2))
  if [ $modulo -eq 1 ]; then
    qpdf --empty --pages "$f" "path/to/blank.pdf" -- "aligned_$f"
  else
    cp "$f" "aligned_$f"
  fi
done

Now, all "aligned_" files have even page numbers, and I can join them using qpdf (thanks to https://stackoverflow.com/a/51080927):

qpdf --verbose --empty --pages aligned_* -- all.pdf

And here the useful code from https://unix.stackexchange.com/a/272878 that I have used for creating the blank page:

echo "" | ps2pdf -sPAPERSIZE=a4 - blank.pdf
mach
  • 175
  • 6
0

This one worked for me. Have used pdfcpu on macos. Can be installed this way:

brew install pdfcpu

And have slightly adjusted the code from https://stackoverflow.com/a/12761103/1369181

#!/bin/bash
mkdir aligned
for f in *.pdf; do
  let npages=$(pdfcpu info "$f"|grep 'Page count:'|awk '{print $3}')
  let modulo="($npages %2)"
  if [ $modulo -eq 1 ]; then
    pdfcpu page insert -pages l -mode after "$f" "aligned/$f"
  else
    cp "$f" "aligned/$f"
  fi
done
pdfcpu merge merged-aligned.pdf aligned/*.pdf
rm -rf aligned

NB! It creates and removes "aligned" directory in the current directory. So feel free to improve it to make it safe for use.

newbot
  • 1
  • 1