1

I am working on image processing in Python and facing an issue with extracting the original file extension from base64 image data. Initially, I open the image and read it as binary using the following code:

with open(img_path_input, 'rb') as img:
        img_bin = img.read()

After obtaining the binary data (img_bin), I convert it to base64 format using the base64 module like this:

img_base64 = base64.b64encode(img_bin).decode()

However, I need to recover the original file extension later on without relying on storing or passing the original extension using the img_path_input. I am wondering if there is an alternative method within the base64 module or if the extension is implicitly encoded in the base64 string.

Any suggestions or alternatives to solve this problem would be greatly appreciated.

nekovolta
  • 496
  • 2
  • 14
  • 1
    Also, recognize what you want to do in terms of individual steps, and search for each of those, then piece the steps together. – Mike 'Pomax' Kamermans Jul 23 '23 at 20:53
  • 3
    Whatever you have in base64 is just encoded bytes. It depends on what these bytes represent. If they are the contents of the original image file, then you can probably determine the format. If they are some binary representation of the image after decoding of the format, then it would be very hard to determine what lossy format was used (like .jpg) and impossible to tell what lossless format was used. What have you tried yourself? What problem did you run into? If you're just asking "is this possible?" the question doesn't belong here (and yes/no probably doesn't help anyway) – Grismar Jul 23 '23 at 20:56
  • 1
    Have a look at [how to ask](https://stackoverflow.com/help/how-to-ask) – Grismar Jul 23 '23 at 20:57
  • 1
    Thank you for your comments. In the question "is there any way..." it is implicit "how to do so...". What I am trying to solve is: I have a long string which represents an image code in base64, after it was loaded in binary format. I would edit my question to be more concise. – nekovolta Jul 23 '23 at 20:59
  • 4
    The original extension is not something that's preserved through the encode/decode process, it's metadata external to that. You can usually "guess" the type based on headers, PNG files will be instantly recognizable, same with JPEG, but in the case of JPEG there are two common file extensions, `.jpg` and `.jpeg`, as well as other wacky cousins like `.jfif` etc. There's no way to know which was used unless you pass that in parallel as metadata. – tadman Jul 23 '23 at 21:17
  • Yes, answer is here https://stackoverflow.com/a/49690539/2836621 – Mark Setchell Jul 23 '23 at 21:28

1 Answers1

2

To retrieve the original image format from a base64-encoded image in Python, you can use the imghdr module. The imghdr module provides a function called what() that can determine the image format based on the content of the image data.

import base64
import imghdr

def get_image_extension_from_base64(base64_str:str):
    if base64_str.startswith("data:image/"):
        base64_str = base64_str.split(";base64,", 1)[1]

    image_data = base64.b64decode(base64_str, validate=True)

    extension = imghdr.what(None, h=image_data)
    return extension


base64_image = "your base64-encoded image"

extension = get_image_extension_from_base64(base64_image)
print("Image extension:", extension)

Note that the imghdr.what() function may not always be able to detect the image format correctly, especially if the image data is corrupted or if the image format is not supported by the imghdr module. In such cases, the function may return None as the image extension.

If you have any further questions or need additional assistance, feel free to ask.