2

I'm sending a request for a set of images to one of my API's. The API returns these images in a JSON format. This format contains data about the resource together with a single property that represents the image in Base64.

An example of the JSON being returned.

{
    "id": 548613,
    "filename": "00548613.png",
    "pictureTaken": "2020-03-30T11:38:21.003",
    "isVisible": true,
    "lotcode": 23,
    "company": "05",
    "concern": "46",
    "base64": "..."
}

The correct content of the Base64
The incorrectly parsed Base64

This is done with the Python3 requests library. When i receive a successful response from the API i attempt to decode the body to JSON using:

url = self.__url__(f"/rest/all/V1/products/{sku}/images")
headers = self.__headers__()
r = requests.get(url=url, headers=headers)
if r.status_code == 200:
    return r.json()
elif r.status_code == 404:
    return None
else:
    raise IOError(
        f"Error retrieving product '{sku}', got {r.status_code}: '{r.text}'")

Calling .json() results in the Base64 content being messed up, some parts are not there, and some are replaced with other characters. I tried manually decoding the content using r.content.decode() with the utf-8 and ascii options to see if this was the problem after seeing this post. Sadly this didn't work. I know the response from the server is correct, it works with Postman, and calling print(r.content) results in a JSON document containing the valid Base64.

How would i go about de-serializing the response from the API to get the valid Base64?

Harjan
  • 533
  • 1
  • 6
  • 24
  • @Trenton I assume you mean the Base64, sadly i cannot share it because i do not have ownership of the serialized resources. – Harjan Jun 15 '20 at 18:00
  • 1
    @Harjan Take a random image of a duck. Convert it to base64. Put that base64 in a request like the one you provided and see if the problem arises. If yes, post that request so we can try. – Bakuriu Jun 15 '20 at 20:05
  • 1
    @Trenton I have added some Base64, it should be a 1024x1024 picture of a pink and white box when parsed correctly. – Harjan Jun 16 '20 at 07:10

1 Answers1

1
import base64
import re
...
b64text = re.search(b"\"base64\": \"(?P<base>.*)\"", r.content, flags=re.MULTILINE).group("base")
decode = base64.b64decode(b64text).decode(utf-8)

Since you're saying "calling print(r.content) results in the valid Base64", it's just a matter of decoding the base64.

qedk
  • 468
  • 6
  • 18
  • Good suggestion, i think this might have worked if it was just Base64 that was being returned. Calling this on my content results in the entire JSON response being decoded from Base64. – Harjan Jun 15 '20 at 17:58
  • @Harjan then it's just a matter of extracting the base64 data from the text directly, see my answer for an example implementation. – qedk Jun 15 '20 at 20:00
  • I tried your edited solution. But calling `r.content` or `r.text` results in the same corrupted Base64. Extracting works, but parsing is not possible because it still contains the illegal characters. – Harjan Jun 16 '20 at 07:43
  • @Harjan Check your content-type and charset, the default in requests is text/html, you can set a charset utf-8, that's probably not what your API is using, set the appropriate value using r.encoding and retry. Have you tried using urrlib and reproducing this behaviour? – qedk Jun 16 '20 at 10:21
  • follow https://stackoverflow.com/questions/37225035/serialize-in-json-a-base64-encoded-data – PruthviRaj Reddy Jun 16 '20 at 22:02
  • @PruthviRajReddy that's for serializing, not for deserializing, it's a bit more difficult, since you don't want to b64-decode the whole response, the correct approach is to take the response data in bytes (hopefully that's stored in valid base64) and deserialize it chunk by chunk using base64 decode. – qedk Jun 17 '20 at 07:30
  • @qedk Correct me if I'm wrong. The original response has both metadata and image content. I was pointing to string encode /serialize and send over network so that response received can be parsed form requests json directly and decode(des-serialize) the base64 value – PruthviRaj Reddy Jun 17 '20 at 15:47