0

I have a json file, which I can easily load into a dict using python. The value of one of the elements is a base64-encoded string, but the output type of this value due to json loading, is str. For example:

'a': 'BAAAAAAAAj5FEMB=='

This is not a valid string, BTW. The real one is proprietary.

I've tried to decode this str a number of ways, but I've been unsuccessful because all examples I've seen assume that the string is already base64 encoded not as str. Base64 decoding throws a error, and if I try base64 encoding/decoding I get the same string back (as expected). I need a way to convert the str to a base64 type without encoding it so I can then decode it.

Help would be appreciated.

EDIT:

I tried the following

x = {'a':'kjuhdfxkjahgfdhj'}
y = x['a'].encode('utf-8')
z = base64.b64decode(y)

z is just a binary string that looks something like

b'\x04\x00\x00\x00AJ ...........'

Nick Elias
  • 61
  • 5
  • Sounds like it's just base 64 encoded binary data. – fredrik Jun 17 '21 at 20:33
  • That may be correct. Ultimately, this string is supposed to be converted to another dict. Is there a way to convert that binary string to a dict? – Nick Elias Jun 17 '21 at 20:36
  • 1
    You have to know the binary format used. – fredrik Jun 17 '21 at 20:37
  • So base64 isn't the binary format used? – Nick Elias Jun 17 '21 at 20:39
  • No, base64 is a text representation of binary data, base 64 instead of base 2 – fredrik Jun 17 '21 at 21:04
  • from the latest updates, the data is likely _both_ encoded in base64 and also a binary format inside (which could be _anything_); this is fairly common because base64-ing data makes it easier to transfer over http, but the reader must know what the data is in order to read it .. perhaps you could try writing it to a file and use [file](https://man7.org/linux/man-pages/man1/file.1.html) to figure out what it is, but finding out from the source author is likely the ideal route – ti7 Jun 17 '21 at 21:04

1 Answers1

0

you can use .encode() and .decode() to convert strings

>>> import base64
>>> s = "test string"
>>> base64.b64encode(s.encode())
b'dGVzdCBzdHJpbmc='
>>> base64.b64decode(b'dGVzdCBzdHJpbmc=')
b'test string'
>>> base64.b64decode(b'dGVzdCBzdHJpbmc=').decode("utf-8")
'test string'
ti7
  • 16,375
  • 6
  • 40
  • 68
  • You have highlighted my problem. The value of the string is already encoded as base64 but the python type after loading the json is str. In other words, I have s = 'dGVzdCBzdHJpbmc=' in str (not binary) format. – Nick Elias Jun 17 '21 at 20:14
  • Closer, but still not correct. I don't have a b'' string, just ''. I've tried converting to bytes but it doesn't help bas64 to decode it. – Nick Elias Jun 17 '21 at 20:19
  • is it possible your string isn't really be base64-encoded, but [something else](https://en.wikipedia.org/wiki/Binary-to-text_encoding#Encoding_standards) or truncated? – ti7 Jun 17 '21 at 20:22
  • The code that writes it is in scala, but it's definitely base64. – Nick Elias Jun 17 '21 at 20:25