Overview
Handle decoding of base64 encoded image files in a concise and (relatively) comprehensive way.
Goals
Primary Goal
- Use python 3.6 to take an arbitrary base64 string, whether as an 'in-memory' string or external file, and generate an valid image file from it.
Secondary Goal
- Use
libmagic
(or some other python library) to determine, as best as is possible, the mime type of an encoded file for directing the correct filetype output.
Preliminary Assumptions
Encoding is Valid
- A small JPG image was encoded in base64. I have verified that the encoding was good by viewing the encoded data through web browser using the browser's native decoding device for encoded images. See below.
This is Doable in Python 3.6
- Seemingly the
base64
package is made for this, unless I'm way off base.
Attempted Solution
Explaination
I created a python class named Converter
to handle the task, with the following features
Constructor
__init__(self, file="none", str_object="none")
- Takes 2 string arguments. Only one is to be specified to replace the default depending on desired usage.
file="none"
- Path to file as a string for file-based decoding.str_object="none"
- Naked encoded string.
Instance Methods
convert_ascii_to_byte_stream(self)
- No arguments
- Returns a
base64
object
convert_byte_stream_to_jpg(self)
- No arguments
- Opens a new file, attempts to write a byte stream (using
convert_ascii_to_byte_stream
, then closes the file. - Returns nothing
@staticmethod strip_all_whitespace(s)
- Takes a single string of arbitrary length.
- Returns a string with all whitespace, newlines, tabs, and carriage returns removed.
Usage
__main__.py
from Converter import *
c = Converter('ascii_image.txt', 'none') # create Converter instance
# print(c.get_encoding_type()) # attempt to get encoding
c.convert_byte_stream_to_jpg() # output the decoded data to image file
Observations and Questions
- If I invoke the constructor
Converter('none', 'XyZxYzXyZxYzXyZxYz ...)
then the conversion happens flawlessly. Obviously there is a problem with the writing to the file. - I understand the the prefix to the
<img>
source attribute,data:image/jpg;base64,
must be stripped prior to decoding. Therefore the test file am usingascii_image.txt
has been edited to reflect this. - In the original source there are newlines at the end of each line, and a single blank space at the start of each subsequent line. Not sure if this makes a difference, especially considering that I am stripping ALL whitespace prior to attempting decoding.
- I've tried a bunch of convoluted ways of using
libmagic
,python-libmagic
,python-magic-bin
,file-magic
, and various other extended packages with much confusion and little progress. Most of them just created a spaghetti stream of error messages about missing dependencies, compiler errors, and other problems. Any suggestions would be appreciated.
Relevant Code and Error Messages
Converter
Python Class
from io import *
import base64
# import magic
class Converter:
def __init__(self, file="none", str_object="none"):
self.file = file
self.str_object = str_object
if file == "none" or "":
self.isFromStringInput = True
self.asciiString = str_object
else:
f = open(file, 'r')
self.isFromStringInput = False
# this was the source of the pad error
#self.asciiString = f.read(file.__len__())
# changed to ... duh!
self.asciiString = f.read()
f.close()
def convert_byte_stream_to_jpg(self):
f = open('ascii_image.jpg', 'wb')
f.write(self.convert_ascii_to_byte_stream())
f.close()
return
def convert_ascii_to_byte_stream(self):
return base64.b64decode(self.strip_all_whitespace(self.asciiString))
@staticmethod
def strip_all_whitespace(s):
return s.replace('\n', ' ').replace('\r', '').replace('\t', '').replace(' ', '')
# def get_encoding_type(self):
# m = magic.MAGIC_MIME
# m.from_bytes(self.convert_ascii_to_byte_stream(), 'little')
# return m.file('./' + self.file)
Error Message
Traceback (most recent call last):
File "/Users/auser/PycharmProjects/btobin/__main__.py", line 5, in <module>
c.convert_byte_stream_to_jpg() # output the decoded data to image file
File "/Users/auser/PycharmProjects/btobin/Converter.py", line 22, in convert_byte_stream_to_jpg
f.write(self.convert_ascii_to_byte_stream())
File "/Users/auser/PycharmProjects/btobin/Converter.py", line 27, in convert_ascii_to_byte_stream
return base64.b64decode(self.strip_all_whitespace(self.asciiString))
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/base64.py", line 87, in b64decode
return binascii.a2b_base64(s)
binascii.Error: Incorrect padding
Correct Encoding Verified
If one places the following <img>
tag in an html file and opens it with a browser they will see the properly decoded image.
Here is an example of the Decoded Image
<img src="
eQABAAQAAAAeAAD/4QOPaHR0cDovL25zLmFkb2JlLmNvbS94YXAvMS4wLwA8P3hwYWNrZXQgYm
VnaW49Iu+7vyIgaWQ9Ilc1TTBNcENlaGlIenJlU3pOVGN6a2M5ZCI/PiA8eDp4bXBtZXRhIHht
bG5zOng9ImFkb2JlOm5zOm1ldGEvIiB4OnhtcHRrPSJBZG9iZSBYTVAgQ29yZSA1LjYtYzAxNC
A3OS4xNTY3OTcsIDIwMTQvMDgvMjAtMDk6NTM6MDIgICAgICAgICI+IDxyZGY6UkRGIHhtbG5z
OnJkZj0iaHR0cDovL3d3dy53My5vcmcvMTk5OS8wMi8yMi1yZGYtc3ludGF4LW5zIyI+IDxyZG
Y6RGVzY3JpcHRpb24gcmRmOmFib3V0PSIiIHhtbG5zOnhtcE1NPSJodHRwOi8vbnMuYWRvYmUu
Y29tL3hhcC8xLjAvbW0vIiB4bWxuczpzdFJlZj0iaHR0cDovL25zLmFkb2JlLmNvbS94YXAvMS
4wL3NUeXBlL1Jlc291cmNlUmVmIyIgeG1sbnM6eG1wPSJodHRwOi8vbnMuYWRvYmUuY29tL3hh
cC8xLjAvIiB4bXBNTTpPcmlnaW5hbERvY3VtZW50SUQ9InhtcC5kaWQ6ZTg0NGJjNjUtYjAzZS
00ODZiLThlYTctZDFjZTY4OGU5YTc2IiB4bXBNTTpEb2N1bWVudElEPSJ4bXAuZGlkOjM4NTBE
Njg5Mzg2MzExRTZCOEZERTBFNjU0NTg5RUZDIiB4bXBNTTpJbnN0YW5jZUlEPSJ4bXAuaWlkOj
M4NTBENjg4Mzg2MzExRTZCOEZERTBFNjU0NTg5RUZDIiB4bXA6Q3JlYXRvclRvb2w9IkFkb2Jl
IFBob3Rvc2hvcCBDQyAyMDE0IChNYWNpbnRvc2gpIj4gPHhtcE1NOkRlcml2ZWRGcm9tIHN0Um
VmOmluc3RhbmNlSUQ9InhtcC5paWQ6MjU4YjBiOTEtNGJhMC00NjI0LTg5NTUtYjU2ODg0OWIw
OWFhIiBzdFJlZjpkb2N1bWVudElEPSJhZG9iZTpkb2NpZDpwaG90b3Nob3A6ZTllYjAwMGQtOD
A0My0xMTc5LThhODktZjZmMjZkYTVhZGU1Ii8+IDwvcmRmOkRlc2NyaXB0aW9uPiA8L3JkZjpS
REY+IDwveDp4bXBtZXRhPiA8P3hwYWNrZXQgZW5kPSJyIj8+/+4ADkFkb2JlAGTAAAAAAf/bAI
QAEAsLCwwLEAwMEBcPDQ8XGxQQEBQbHxcXFxcXHx4XGhoaGhceHiMlJyUjHi8vMzMvL0BAQEBA
QEBAQEBAQEBAQAERDw8RExEVEhIVFBEUERQaFBYWFBomGhocGhomMCMeHh4eIzArLicnJy4rNT
UwMDU1QEA/QEBAQEBAQEBAQEBA/8AAEQgASwBLAwEiAAIRAQMRAf/EAH0AAAICAwEAAAAAAAAA
AAAAAAAGBAUCAwcBAQEAAAAAAAAAAAAAAAAAAAAAEAABAgQDBQUDCgMJAAAAAAABAgMAEQQFIR
IGMUFRcRNhgZEiMqFCFLHRUmJygpIjM0PBgxWywrMkRIQ1RWURAQAAAAAAAAAAAAAAAAAAAAD/
2gAMAwEAAhEDEQA/AOgQQRUX6/N2lpLbaevXP4U7AxJJwzKljKfjAT6yvo6BrrVjyWW9xUcSeA
G090UC9YOVSy3Z7e9WEfuEFKfYD7ZQW/TDtY6LjqFZqalWKacn8tscDL5Bhzhib+HaAZayNhOC
W0yTLkkQC8K7WzgzIoGGx9FShP8AxI8N81TSGdZaQ42PUpgkmX3SuGeCAo7dq21VywytRpKjZ0
3vLjwCtkXcV9zsdtuqCKpodSXleT5XB97fyMULVXctK1CKW4KNVaHDlZqJEqb7N/4fw8IBvgjF
txDqEuNqC0LAUlQMwQcQRGUBoratqhpHat4ybZSVHiZbAOZwhf0xQO1ry9RXEZqmpJ+GSdjbey
Y+QdnOMtYrXUfAWhsyNa+M/wBhJH8TPuhiabQ02hpsZUNpCUpG4JEgIBK1NqSrcqn7bSK6NO0e
m64kkOLUPUMw2J3QsyE5+9x3+MT74ytm81qFggl5SxPelfnB9sQYBm0je6tNai21Dinad4ENFZ
mptaRmkFHGRAh3jmWnwo3yhy7er7AlU4btY19RR2xCaZZaXUOBtTicFBMioyO6coCfUX6z0rpZ
frGkOJwUnNMg9spyjYsW+8US2gpFTTOjKooIPgRsIjl0vnMWul6tylvdOlskIqVdJ1G5QIJST2
gwF/pyoftdxe05WKzBE3KJw+8g+aQ5jHnmhphX1k2aZVDemsHaR0JUeKD5gPYR3ww/GMfS/a63
8vjAUNzAc1na0K2IZWsc/wAz5oZYWNQn4XUVnrlGTalFhR4ZjL+/DPAc21Ol1N+qw6oqJKSgnc
gpGUDsGMVcPOtLaupoUVrcs1FmU4DtLapZpcpThGgJVsYeqLlSssEpdU6khSZzSEnMpWHACHHW
4b/o6SoEq6yOmRsBxnm7JTiLoWhKWqi4qIIdPRQkbQGzNRPMmL+725u5W92lWJlQzN4yk4nFB8
YDl8SrW8Ke6Ub5SV5HkeRO05jlw8YjLQ40tTTySh1BKXEHalQ2iGDRtsFXXqrHkFTNJItqnIdb
dzypx8IBh1ehK9P1U/dyKHMLTFV8S5ln/wCFm780on63f6dkLKT56hxCEp4yOc/2Y3f0deyX/W
/B/egM9U21VxtDiWhOoYPWZltJRtA5icRKPWFtFqYqKtw/FFOVxhAKllacCZcDtxhjhD1VYFUL
y7hSpnROnM6kfsrJxP2FHwMBEvmoam8KSjJ0KZskpbCiVKmJfmEYHlFTBBAT7Pd37RVGoaSHUr
TkcaUSAUznNMth7oZH9Zs1FIU0SQxVqEv8yciEz3pWkKBPCcoTIIDdVNVba+pVhZW6Zl5UlBwn
fnTNJMTLJe37PUFYSXadYk4xmyieHnG7MJSiFT1L1NMNK/LV+oyoZmljgtBwPywNtOVVSGaVol
x5UmmUkmXZM7hxMAziqGqL/SBpCk0FCnrOBYxzz2HmQAO+HGKyw2Zu0UQZmF1Dhz1Dg95fAfVT
sEWcARipKVpKFgKSoSUk4gg7jGUEAoXfRU1KftKgmeJpVmSf5at3Iwr1VHWUaslWw4weK0nL3K
9Ptjq8a3v0lenZ7/p74DkudJ2KHjAFBRkk5lbgMT4CHio/UP8Aw239z1RY2r1f6H/Z7YBMt2mb
vXkENGmZO154FOH1Ueow62ew0VobPRBcfWJOVC/WrsH0R2CLOCAIIIID/9k=" />