17

I basically need to do this but in Python instead of Javascript. I receive a base64 encoded string from a socketio connection, convert it to uint8 and work on it, then need to convert it to base64 string so I can send it back.

So, up to this point I've got this (I'm getting the data dictionary from the socketio server):

import pickle
import base64
from io import BytesIO
from PIL import Image

base64_image_string = data["image"]
image = Image.open(BytesIO(base64.b64decode(base64_image_string)))
img = np.array(image)

How do I reverse this process to get from img back to base64_image_string?

UPDATE:
I have solved this in the following manner (continuing from the code snippet above):

pil_img = Image.fromarray(img)
buff = BytesIO()
pil_img.save(buff, format="JPEG")
new_image_string = base64.b64encode(buff.getvalue()).decode("utf-8")

Somewhat confusingly, new_image_string is not identical to base64_image_string but the image rendered from new_image_string looks the same so I'm satisfied!

Community
  • 1
  • 1
Ryan Keenan
  • 301
  • 1
  • 2
  • 8

3 Answers3

12

I believe since numpy.arrays support the buffer protocol, you just need the following:

processed_string = base64.b64encode(img)

So, for example:

>>> encoded = b"aGVsbG8sIHdvcmxk"
>>> img = np.frombuffer(base64.b64decode(encoded), np.uint8)
>>> img
array([104, 101, 108, 108, 111,  44,  32, 119, 111, 114, 108, 100], dtype=uint8)
>>> img.tobytes()
b'hello, world'
>>> base64.b64encode(img)
b'aGVsbG8sIHdvcmxk'
>>>
juanpa.arrivillaga
  • 88,713
  • 10
  • 131
  • 172
  • 1
    This is amazing! BUT, when I deserialize, my 3-D `numpy` array comes out as 1-D only - must I `reshape` using values for the original shape carried with the array, or is there a way that circumvents `reshape`? Thx! – jtlz2 Aug 16 '18 at 14:54
  • It's quite ugly but I ended up doing: `payload = ('nparr',img.shape,base64.b64encode(img.copy(order='C')))` and deserializing correspondingly. Is there a way around this? Huge thanks! – jtlz2 Aug 16 '18 at 16:32
  • 1
    @jtlz2 no, but have you considered another serialization method like [`numpy.save`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.save.html1)? – juanpa.arrivillaga Aug 16 '18 at 16:33
  • Thanks, I tried that route (see https://stackoverflow.com/a/43925741/1021819) - but in the end I thought it safer to use base64 over the wire than a byte string. Do you have a view on which is preferable? – jtlz2 Aug 16 '18 at 16:36
  • 1
    @jtlz2 sorry, but I dont really follow. Base64 produces a byte string. And it is safer in what sense? `numpy.save` is much much more portable across platforms, taking care of things like byte-order along with shape out if the box. If I was choosing with no arbitrary restrictions, I would undoubtedly use `numpy.save` over base64 encoding. It's probably faster, and definitely less memory intensive – juanpa.arrivillaga Aug 16 '18 at 16:41
  • Thanks! Went down this route in the end - https://stackoverflow.com/a/27948073/1021819 – jtlz2 Aug 17 '18 at 10:16
0

I have the same problem. After some search and try, my final solution is almost the same as yours.

The only difference is that the base64 encoded string is png format data, so I need to change it from RGBA to RGB channels before converted to np.array:

image = image.convert ("RGB")
img = np.array(image)

In the reverse process, you treate the data as JPEG format, maybe this is the reason why new_image_string is not identical to base64_image_string ?

Ce Ge
  • 1
  • 1
-1

from http://www.programcreek.com/2013/09/convert-image-to-string-in-python/ :

import base64

with open("t.png", "rb") as imageFile:
    str = base64.b64encode(imageFile.read())
    print str

is binary read

https://docs.python.org/2/library/base64.html

ralf htp
  • 9,149
  • 4
  • 22
  • 34
  • 1
    Wait, why would you have to convert the `numpy` array to utf-8? – juanpa.arrivillaga Apr 09 '17 at 20:03
  • the base64 alphabet is "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/=" *Base64 ist ein Verfahren zur Kodierung von 8-Bit-Binärdaten (z. B. ausführbare Programme, ZIP-Dateien oder Bilder)* this translates as *Base 64 is a standard to encode 8-bit binary data (i.e. executables, ZIP files or images)* – ralf htp Apr 09 '17 at 20:09
  • 1
    ... yes, so what? – juanpa.arrivillaga Apr 09 '17 at 20:10
  • it's an array of byes (uint8). See the example I posted. In any event, I'm not sure why you think converting to *utf8* would help... – juanpa.arrivillaga Apr 09 '17 at 20:14
  • ok nice i did not know that numpy arrays are structured on bytes because normally i use them with composed datatypes like float, integer,... – ralf htp Apr 09 '17 at 20:18
  • well... a 32-bit float or integer is just a sequence of 4 byte chunks... a 64 bit integer or float is a sequence of 8-byte chunks... a 8-bit integer is a sequence of 1-byte chunks... Numpy arrays are basically object-oriented wrappers around C arrays with a bunch of handy routines built-in. – juanpa.arrivillaga Apr 09 '17 at 20:19
  • yes, i know tis and exactly this is the reason why i think that conversion to utf-8 would help. Because if the image in the numpy array is not binary coded but instead integer, float,.. then the base64 encoded string would contain wrong data because of the binary read... – ralf htp Apr 09 '17 at 20:22
  • however normally images are encoded in RGB or other **byte-oriented** standards and so you are right... – ralf htp Apr 09 '17 at 20:25