16

I have a PDF as a base64 string and I need to write it to file using Python. I tried this:

import base64

base64String = "data:application/pdf;base64,JVBERi0xLjQKJeHp69MKMSAwIG9iago8PC9Qcm9kdWNlciAoU2tpYS9..."

with open('temp.pdf', 'wb') as theFile:
  theFile.write(base64.b64decode(base64String))

But it didn't create a valid PDF file. What am I missing?

Abdulrahman Bres
  • 2,603
  • 1
  • 20
  • 39
Rafael Miller
  • 305
  • 1
  • 2
  • 13

6 Answers6

11

From my understanding base64decode only takes in a base64 string and looks like you have some headers on your string that are not encoded.

I would remove "data:application/pdf;base64,"

check out the doc here: https://docs.python.org/2/library/base64.html

When I've used it in the past, I have only used the encoded string.

Mark
  • 336
  • 3
  • 9
9

Does writing it by using the codecs.decode function work? also as Mark stated, you can try to remove the data:application/pdf;base64, portion of the string as this section of the string is not to be decoded.:

import codecs
base64String = "JVBERi0xLjQKJeHp69MKMSAwIG9iago8PC9Qcm9kdWNlciAoU2tpYS9..."


with open("test.pdf", "wb") as f:
    f.write(codecs.decode(base64string, "base64"))
Jebby
  • 1,845
  • 1
  • 12
  • 25
5

Extending @Jebby's answer using Base64 (had the same issue as @SmartManoj)

import base64
base64String = "JVBERi0xLjQKJeHp69MKMSAwIG9iago8PC9Qcm9kdWNlciAoU2tpYS9..."


with open("test.pdf", "wb") as f:
    f.write(base64.b64decode(base64string))
Dfranc3373
  • 2,048
  • 4
  • 30
  • 44
3

This is not just base64 encoded data, but data-uri encoded:

https://developer.mozilla.org/en-US/docs/Web/HTTP/Basics_of_HTTP/Data_URIs

There is another post on stack overflow asking how to parse such strings in Python:

How to parse data-uri in python?

The gist of it is to remove the header (everything up to and including the first comma):

theFile.write(base64.b64decode(base64String.split(",")[1:2]))

NOTE: I use [1:2] instead of [1] because it won't throw an exception if there is only 1 element in the list because nothing follows the comma (empty data).

Forest Darling
  • 311
  • 1
  • 3
0

Here is my solution::--

from base64 import b64decode

def base64_to_pdf(file):
    file_bytes = b64decode(file, validate=True)
    if file_bytes[0:4] != b"%PDF":
        raise ValueError("Missing the PDF file signature")

    with open("file.pdf", "wb") as f:
        return f.write(file_bytes)
0
for some reason above code didnt work for me but below worked.

import base64
base64String = "JVBERi0xLjQKJeHp69MKMSAwIG9iago8PC9Qcm9kdWNlciAoU2tpYS9..."
data = base64.b64decode(base64string)

with open("test.pdf", "wb") as f:
    f.write(data)
  • yeah if am using f.write(base64.b64decode(base64string)) am not getting any error but getting an empty pdf file. adding additional step to decode in another variable and then writing it to file is working fine.. if i am using b64decode(file, validate=True) it is giving me error like non base64 character in your string. probably its because of pdf has an embedded image. – Ritesh Bawaskar Jun 05 '23 at 03:39