1

Overview

Handle decoding of base64 encoded image files in a concise and (relatively) comprehensive way.

Goals

Primary Goal

  • Use python 3.6 to take an arbitrary base64 string, whether as an 'in-memory' string or external file, and generate an valid image file from it.

Secondary Goal

  • Use libmagic (or some other python library) to determine, as best as is possible, the mime type of an encoded file for directing the correct filetype output.

Preliminary Assumptions

Encoding is Valid

  • A small JPG image was encoded in base64. I have verified that the encoding was good by viewing the encoded data through web browser using the browser's native decoding device for encoded images. See below.

This is Doable in Python 3.6

  • Seemingly the base64 package is made for this, unless I'm way off base.

Attempted Solution

Explaination

I created a python class named Converter to handle the task, with the following features

Constructor

  • __init__(self, file="none", str_object="none")
    • Takes 2 string arguments. Only one is to be specified to replace the default depending on desired usage.
    • file="none" - Path to file as a string for file-based decoding.
    • str_object="none" - Naked encoded string.

Instance Methods

  • convert_ascii_to_byte_stream(self)
    • No arguments
    • Returns a base64 object
  • convert_byte_stream_to_jpg(self)
    • No arguments
    • Opens a new file, attempts to write a byte stream (using convert_ascii_to_byte_stream, then closes the file.
    • Returns nothing
  • @staticmethod strip_all_whitespace(s)
    • Takes a single string of arbitrary length.
    • Returns a string with all whitespace, newlines, tabs, and carriage returns removed.

Usage

__main__.py

from Converter import *

c = Converter('ascii_image.txt', 'none')  # create Converter instance
# print(c.get_encoding_type()) # attempt to get encoding
c.convert_byte_stream_to_jpg()  # output the decoded data to image file

Observations and Questions

  • If I invoke the constructor Converter('none', 'XyZxYzXyZxYzXyZxYz ...) then the conversion happens flawlessly. Obviously there is a problem with the writing to the file.
  • I understand the the prefix to the <img> source attribute, data:image/jpg;base64, must be stripped prior to decoding. Therefore the test file am using ascii_image.txt has been edited to reflect this.
  • In the original source there are newlines at the end of each line, and a single blank space at the start of each subsequent line. Not sure if this makes a difference, especially considering that I am stripping ALL whitespace prior to attempting decoding.
  • I've tried a bunch of convoluted ways of using libmagic, python-libmagic, python-magic-bin, file-magic, and various other extended packages with much confusion and little progress. Most of them just created a spaghetti stream of error messages about missing dependencies, compiler errors, and other problems. Any suggestions would be appreciated.

Relevant Code and Error Messages

Converter Python Class

from io import *
import base64
# import magic


class Converter:

    def __init__(self, file="none", str_object="none"):
        self.file = file
        self.str_object = str_object
        if file == "none" or "":
            self.isFromStringInput = True
            self.asciiString = str_object
        else:
            f = open(file, 'r')
            self.isFromStringInput = False
            # this was the source of the pad error
            #self.asciiString = f.read(file.__len__())
            # changed to ... duh!
            self.asciiString = f.read()
            f.close()

    def convert_byte_stream_to_jpg(self):
        f = open('ascii_image.jpg', 'wb')
        f.write(self.convert_ascii_to_byte_stream())
        f.close()
        return

    def convert_ascii_to_byte_stream(self):
        return base64.b64decode(self.strip_all_whitespace(self.asciiString))

    @staticmethod
    def strip_all_whitespace(s):
        return s.replace('\n', ' ').replace('\r', '').replace('\t', '').replace(' ', '')

    # def get_encoding_type(self):
    #     m = magic.MAGIC_MIME
    #     m.from_bytes(self.convert_ascii_to_byte_stream(), 'little')
    #     return m.file('./' + self.file)

Error Message

Traceback (most recent call last):
  File "/Users/auser/PycharmProjects/btobin/__main__.py", line 5, in <module>
    c.convert_byte_stream_to_jpg()  # output the decoded data to image file
  File "/Users/auser/PycharmProjects/btobin/Converter.py", line 22, in convert_byte_stream_to_jpg
    f.write(self.convert_ascii_to_byte_stream())
  File "/Users/auser/PycharmProjects/btobin/Converter.py", line 27, in convert_ascii_to_byte_stream
    return base64.b64decode(self.strip_all_whitespace(self.asciiString))
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/base64.py", line 87, in b64decode
    return binascii.a2b_base64(s)
binascii.Error: Incorrect padding

Correct Encoding Verified

If one places the following <img> tag in an html file and opens it with a browser they will see the properly decoded image. Here is an example of the Decoded Image

<img src="data:image/jpg;base64,/9j/4QAYRXhpZgAASUkqAAgAAAAAAAAAAAAAAP/sABFEdWNr
 eQABAAQAAAAeAAD/4QOPaHR0cDovL25zLmFkb2JlLmNvbS94YXAvMS4wLwA8P3hwYWNrZXQgYm
 VnaW49Iu+7vyIgaWQ9Ilc1TTBNcENlaGlIenJlU3pOVGN6a2M5ZCI/PiA8eDp4bXBtZXRhIHht
 bG5zOng9ImFkb2JlOm5zOm1ldGEvIiB4OnhtcHRrPSJBZG9iZSBYTVAgQ29yZSA1LjYtYzAxNC
 A3OS4xNTY3OTcsIDIwMTQvMDgvMjAtMDk6NTM6MDIgICAgICAgICI+IDxyZGY6UkRGIHhtbG5z
 OnJkZj0iaHR0cDovL3d3dy53My5vcmcvMTk5OS8wMi8yMi1yZGYtc3ludGF4LW5zIyI+IDxyZG
 Y6RGVzY3JpcHRpb24gcmRmOmFib3V0PSIiIHhtbG5zOnhtcE1NPSJodHRwOi8vbnMuYWRvYmUu
 Y29tL3hhcC8xLjAvbW0vIiB4bWxuczpzdFJlZj0iaHR0cDovL25zLmFkb2JlLmNvbS94YXAvMS
 4wL3NUeXBlL1Jlc291cmNlUmVmIyIgeG1sbnM6eG1wPSJodHRwOi8vbnMuYWRvYmUuY29tL3hh
 cC8xLjAvIiB4bXBNTTpPcmlnaW5hbERvY3VtZW50SUQ9InhtcC5kaWQ6ZTg0NGJjNjUtYjAzZS
 00ODZiLThlYTctZDFjZTY4OGU5YTc2IiB4bXBNTTpEb2N1bWVudElEPSJ4bXAuZGlkOjM4NTBE
 Njg5Mzg2MzExRTZCOEZERTBFNjU0NTg5RUZDIiB4bXBNTTpJbnN0YW5jZUlEPSJ4bXAuaWlkOj
 M4NTBENjg4Mzg2MzExRTZCOEZERTBFNjU0NTg5RUZDIiB4bXA6Q3JlYXRvclRvb2w9IkFkb2Jl
 IFBob3Rvc2hvcCBDQyAyMDE0IChNYWNpbnRvc2gpIj4gPHhtcE1NOkRlcml2ZWRGcm9tIHN0Um
 VmOmluc3RhbmNlSUQ9InhtcC5paWQ6MjU4YjBiOTEtNGJhMC00NjI0LTg5NTUtYjU2ODg0OWIw
 OWFhIiBzdFJlZjpkb2N1bWVudElEPSJhZG9iZTpkb2NpZDpwaG90b3Nob3A6ZTllYjAwMGQtOD
 A0My0xMTc5LThhODktZjZmMjZkYTVhZGU1Ii8+IDwvcmRmOkRlc2NyaXB0aW9uPiA8L3JkZjpS
 REY+IDwveDp4bXBtZXRhPiA8P3hwYWNrZXQgZW5kPSJyIj8+/+4ADkFkb2JlAGTAAAAAAf/bAI
 QAEAsLCwwLEAwMEBcPDQ8XGxQQEBQbHxcXFxcXHx4XGhoaGhceHiMlJyUjHi8vMzMvL0BAQEBA
 QEBAQEBAQEBAQAERDw8RExEVEhIVFBEUERQaFBYWFBomGhocGhomMCMeHh4eIzArLicnJy4rNT
 UwMDU1QEA/QEBAQEBAQEBAQEBA/8AAEQgASwBLAwEiAAIRAQMRAf/EAH0AAAICAwEAAAAAAAAA
 AAAAAAAGBAUCAwcBAQEAAAAAAAAAAAAAAAAAAAAAEAABAgQDBQUDCgMJAAAAAAABAgMAEQQFIR
 IGMUFRcRNhgZEiMqFCFLHRUmJygpIjM0PBgxWywrMkRIQ1RWURAQAAAAAAAAAAAAAAAAAAAAD/
 2gAMAwEAAhEDEQA/AOgQQRUX6/N2lpLbaevXP4U7AxJJwzKljKfjAT6yvo6BrrVjyWW9xUcSeA
 G090UC9YOVSy3Z7e9WEfuEFKfYD7ZQW/TDtY6LjqFZqalWKacn8tscDL5Bhzhib+HaAZayNhOC
 W0yTLkkQC8K7WzgzIoGGx9FShP8AxI8N81TSGdZaQ42PUpgkmX3SuGeCAo7dq21VywytRpKjZ0
 3vLjwCtkXcV9zsdtuqCKpodSXleT5XB97fyMULVXctK1CKW4KNVaHDlZqJEqb7N/4fw8IBvgjF
 txDqEuNqC0LAUlQMwQcQRGUBoratqhpHat4ybZSVHiZbAOZwhf0xQO1ry9RXEZqmpJ+GSdjbey
 Y+QdnOMtYrXUfAWhsyNa+M/wBhJH8TPuhiabQ02hpsZUNpCUpG4JEgIBK1NqSrcqn7bSK6NO0e
 m64kkOLUPUMw2J3QsyE5+9x3+MT74ytm81qFggl5SxPelfnB9sQYBm0je6tNai21Dinad4ENFZ
 mptaRmkFHGRAh3jmWnwo3yhy7er7AlU4btY19RR2xCaZZaXUOBtTicFBMioyO6coCfUX6z0rpZ
 frGkOJwUnNMg9spyjYsW+8US2gpFTTOjKooIPgRsIjl0vnMWul6tylvdOlskIqVdJ1G5QIJST2
 gwF/pyoftdxe05WKzBE3KJw+8g+aQ5jHnmhphX1k2aZVDemsHaR0JUeKD5gPYR3ww/GMfS/a63
 8vjAUNzAc1na0K2IZWsc/wAz5oZYWNQn4XUVnrlGTalFhR4ZjL+/DPAc21Ol1N+qw6oqJKSgnc
 gpGUDsGMVcPOtLaupoUVrcs1FmU4DtLapZpcpThGgJVsYeqLlSssEpdU6khSZzSEnMpWHACHHW
 4b/o6SoEq6yOmRsBxnm7JTiLoWhKWqi4qIIdPRQkbQGzNRPMmL+725u5W92lWJlQzN4yk4nFB8
 YDl8SrW8Ke6Ub5SV5HkeRO05jlw8YjLQ40tTTySh1BKXEHalQ2iGDRtsFXXqrHkFTNJItqnIdb
 dzypx8IBh1ehK9P1U/dyKHMLTFV8S5ln/wCFm780on63f6dkLKT56hxCEp4yOc/2Y3f0deyX/W
 /B/egM9U21VxtDiWhOoYPWZltJRtA5icRKPWFtFqYqKtw/FFOVxhAKllacCZcDtxhjhD1VYFUL
 y7hSpnROnM6kfsrJxP2FHwMBEvmoam8KSjJ0KZskpbCiVKmJfmEYHlFTBBAT7Pd37RVGoaSHUr
 TkcaUSAUznNMth7oZH9Zs1FIU0SQxVqEv8yciEz3pWkKBPCcoTIIDdVNVba+pVhZW6Zl5UlBwn
 fnTNJMTLJe37PUFYSXadYk4xmyieHnG7MJSiFT1L1NMNK/LV+oyoZmljgtBwPywNtOVVSGaVol
 x5UmmUkmXZM7hxMAziqGqL/SBpCk0FCnrOBYxzz2HmQAO+HGKyw2Zu0UQZmF1Dhz1Dg95fAfVT
 sEWcARipKVpKFgKSoSUk4gg7jGUEAoXfRU1KftKgmeJpVmSf5at3Iwr1VHWUaslWw4weK0nL3K
 9Ptjq8a3v0lenZ7/p74DkudJ2KHjAFBRkk5lbgMT4CHio/UP8Aw239z1RY2r1f6H/Z7YBMt2mb
 vXkENGmZO154FOH1Ueow62ew0VobPRBcfWJOVC/WrsH0R2CLOCAIIIID/9k=" />
bencodesall
  • 69
  • 1
  • 9
  • This is related... https://stackoverflow.com/a/49690539/2836621 – Mark Setchell Jun 11 '18 at 20:31
  • @MarkSetchell Thanks for that. – bencodesall Jun 11 '18 at 21:15
  • @MarkSetchell I invoked the constructor using the "naked string" method the encoded data as a single line string, and viola, ... that worked. Must be missing something in reading from the file. – bencodesall Jun 11 '18 at 21:20
  • Ahh. Ok, so I made an error in assuming that file.read() was only going to read line by line, and so I added the `file.__len__()` as an argument. Removing that fixed the file pad error. – bencodesall Jun 11 '18 at 21:31

0 Answers0