0

I'm working with Python and Django and want to merge several PDF's into a single one.

I've seen several answers to this question, like this and this, which I've tried to apply but I'm getting errors.

First, this is the PDF data I'm working with: In my example, I have these two PDFs:

type(pdf1) = <class 'bytes'>
type(pdf2) = <class 'bytes'>

And if I print() them, they look like this:

b'%PDF-1.5\n%\xb5\xed\xae\xfb\n4 0 obj\n<< /Length 5 0 R\n   /Filter /FlateDecode\n>>\nstream\nx\x9c\x85Vmo\xdb6\x10\xfe\xae_q(VT\x06f\x99\x94\xac7 \x0b\x90\xa6.\x96\x01u\xdbD\xdf\x82\xc1\x90%\xca\xe6&\x93.E95\x86\xfd\xf7\x92\xa2$S\xa9\x87\xc1\xb0$\x92\x0f\x1f\xde=w<\x12\x03R\xbf9V\x8fd\x89\xa188\xdf\x1c\xd4\xf5\x89\x1d,r\x04\xbb\xc6y\x9f9\xd8\xefq>\x84\x91\xe7\x87\x10\xc5^\x90.\xc3\x00Cvp\x16\xd5\x1c\xcd\x11\xa8\xef\xcaq\xffy\x0b\r\x91p\xccwd#\xa9\xac\t\xfc\x06o\xd6\\\xd2\x82\x00\xaf\xe0\x91P\xd6\xc8\\\x92\x03a\xf2\r\xbc\xfd\x17\xcc\x84Y\xf6\x97\xa3\x17\xf0p\x14\xa2 \x85\xact\xdc:o\xe4F\x90\x82\xd0\x13)7\xf9\x81\xb7Lj\xb6_B\xe4!dM\x86R\x11\x8eH\r\xc1x\x8e\xc2\xb9\x8fp\xa2a3\xe7\n{?\xb5\xd9\xf3\x97M^\x14\x9a\\\xbd%=QyV\x14R\xb4\xc4Z\xa1\x835t\xc7r\xd9\n\xb2\xd9\x93\xbc\xecAW\xc9\xcdD\xca\x8a\xba-\t\xbc\xfb@\xaa\xbc\xad%\xfc\xae\xa6\x11\xf1N\xf3j\x87S/L\xd3$\\\x1aq\xc7\x96\x8f#\xc5\xb5D\x91\x0f8\x89\xbd\x08\xc78Iz\xa1\xf1 \xf4\xb3\xbb\xfe\x9c=\xdc\xaf\xe0\xf3Gx\\=\xac\x9f\xb2Y\x18\xbaw\xfa\x91\xad>\xad\xd6\x19\xcc\xfe\xcc\xfe\xf89r>\x8a=\x1d8\xf5\xf9*t\xcf\xee5\x93\xbf\xcc\x96\xd8\xe55-\xce\xf0\xc0*nl\xd7\xcc\xb1\x97 \xe4\xa7\x81\xe1\x1f[\xfd*a\xe8a\x1c\xf8I\x97 \xcfn\xb6\xa7\xcdl\x1e\xc5.t\xcf\x17Z\xd7c3q\x1b"N\xc4\x1a\xcem,\xebrG\x7f:=\\\xeesi\x01\xe4~2w\xcb-\xae\xc4\xadi#Ii\x01\x8e\x9d33\xe7\xd2C\x9b\xa6\xb5 \x89\xbb=[x\xde\n\xabU\xf0\xc31gg\x0b\xbcW\xd6Zd[B\x98\x85\x17C\xbe\x97\xaf\xfdK\x06\xfe\xaa\x1f\xd1\xa2N#\xdc\xa10^\xf8h\xa1Sy@X\x11\xeb\x1d`\xdd\x90I\xc2x\x99\x04\x91N\xc2gW\xa7\xb5(s\xa6\xe5\x0b;\xacR^\xee\xfbFh\x94\xeb\x07$\x11\x87f\x1c\xc9Y9|;\xcagVRI9\xd3f\x9bqm\xb4M\xd2w\xf7\xda\xe2 p\xbd\x919\x1bVq\xd4S\xd8\xfb\x7f\x84\xd0\xcb\xca\xc4x\xf9\xa4T\x86\x9b\x1bX\xdc\x15\xb2\xcd\xeb\x8c|\x97pS\x91\xaaB(\x8a\xf4\xff\x16no\xe1\xfd\x87{#\x086\x82\xdc \x84\xf0\xad\xdaY\xabO\xf7\xaf\x94"zk_\xdc\xe5\xcc|^\x13=\xfc_\xcd\'\x02\xb9/\xf9\xc5|+\xdeWCR\xaa\x821\xc7A\x17|\x1c\xa4\xae\xe4C\xf3\x9a%z\xe4n\xd7)\xa5\xc1`z\x1e\xc9\xb7\x96\x8a~\x85\xa9m\xde\x84\xdc\xec\xba\xb1#\x19\xf7\x92E\xa6#i5\xa7\xf1ql\xb6\xfcx\xac)\x99\x12\x1a\xf3\xc7\xd9].XS\x04\xa9\x88 *\xff\xca)U\x9f\'\x13\xd3\xda\xc3\x96\x88\xa9i\xac>O\xe8:\xd1U\xdb\x19\x10%7\xf6\xa4\x96\x83\x96\xee\xd12\x

When I try merging them with like this:

from PyPDF2 import PdfFileMerger

pdfs = [pdf1, pdf2]

merger = PdfFileMerger()

for pdf in pdfs:
   merger.append(pdf)

merger.write("result.pdf")
merger.close()

I get this error: embedded null byte

And when I try merging them like this:

from functools import partial

with open(fpath, 'rb') as f, open(target_fpath, 'wb') as target_f: 
    for _bytes in iter(partial(f.read, 1024), ''):
        target_f.write(_bytes)

I get this other error: 'bytes' object has no attribute 'seek'

I think these errors are caused by the format of the data I'm handling. Should I decode the binary data before being able to apply the solutions I linked?

Xar
  • 7,572
  • 19
  • 56
  • 80
  • Possible duplicate of [Mysterious "embedded null byte" error](https://stackoverflow.com/questions/38731132/mysterious-embedded-null-byte-error) – GSazheniuk Nov 04 '19 at 17:13

1 Answers1

0

It might be a problem with how you imported it. The way I did it was

pdf1 = PdfFileReader("path\\to\\file.pdf")
type(pdf1)

and I got

PyPDF2.pdf.PdfFileReader
Jack Hanson
  • 336
  • 1
  • 2
  • 11
  • Due to how the rest of the codebase is working, I only have access to the binary data, not to the path where they are saved. So I would like to merge them, operating with the binary data if possible. – Xar Nov 04 '19 at 17:34
  • I see. I found this thread (https://stackoverflow.com/questions/43019315/how-do-i-create-a-pdf-file-from-a-binary-code-using-python), so maybe you could convert to PDF from binary, then open the file as binary like ```open(file.pdf, "rb")``` then merge? – Jack Hanson Nov 04 '19 at 17:44