2

I need to encode image into binary, send it to server and decode it back to image again. The decoding method is:

def decode_from_bin(bin_data):
    bin_data = base64.b64decode(bin_data)
    image = np.asarray(bytearray(bin_data), dtype=np.uint8)
    img = cv2.imdecode(image, cv2.IMREAD_COLOR)

    return img

We use OpenCV to encode image:

def encode_from_cv2(img_name):
    img = cv2.imread(img_name, cv2.IMREAD_COLOR)  # adjust with EXIF
    bin = cv2.imencode('.jpg', img)[1]
    return str(base64.b64encode(bin))[2:-1] # Raise error if I remove [2:-1]

You can run with:

raw_img_name = ${SOME_IMG_NAME}

encode_image = encode_from_cv2(raw_img_name)
decode_image = decode_from_bin(encode_image)

cv2.imshow('Decode', decode_image)
cv2.waitKey(0)

My question is: why do we have to strip the first two characters from base64 encoding?

Community
  • 1
  • 1
Tengerye
  • 1,796
  • 1
  • 23
  • 46

1 Answers1

2

Let's analyse what happens inside encode_from_cv2.

The output of base64.b64encode(bin) is a bytes object. When you pass it to str in str(base64.b64encode(bin)), the str function creates a "nicely printable" version of the bytes object, see this answer.

In practice str represents the bytes object as you see it when you print it, i.e. with a leading b' and a traling '. For example

>>> base64.b64encode(bin)
b'/9j/4AAQSkZJRgABAQAAAQABAAD'
>>> str(base64.b64encode(bin))
"b'/9j/4AAQSkZJRgABAQAAAQABAAD'"

That's why you need to remove these characters to obtain the encoded string.

In general, this is not the best way to convert a bytes object to string, because an encoding is needed to specify how to interpret bytes as characters. Here the str function uses the default ASCII encoding.

As explained in this answer, you can replace str(base64.b64encode(bin))[2:-1] with str(base64.b64encode(bin), "utf-8") to get rid of the slice.

LGrementieri
  • 760
  • 4
  • 12