0

I'm new to the site, so please let me know if I need to change anything about this question! Likewise, I'm rather inexperienced with base 64 in general, so please bear with me!

In Python, I have a short program that simply decodes a base 64 string:

import base64

def decodeBase64(string):

    decodeableString = string

    for value in range(len(string)%4):
        decodeableString += '='

    return base64.b64decode(decodeableString)

When trying to decode:

0J3QuNC20LUg0L/RgNC40LLQtdC00LXQvSDQutC+0LQg0LTQvtGB0YLRg9C/0LAg0Log0LfQtNCw0L3QuNGOIFvQo9CU0JDQm9CV0J3Qnl06Ck9WSzhZTFggLyAo0JjQnNCvIC8g0JrQm9Cu0KcpCj09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09ID09PT09PQrQkdCw0LfQsCAzNg==

as part of a challenge, I encountered Russian characters, which this didn't know how to approach, so it just returned:

b'\xd0\x9d\xd0\xb8\xd0\xb6\xd0\xb5 \xd0\xbf\xd1\x80\xd0\xb8\xd0\xb2\xd0\xb5\xd0\xb4\xd0\xb5\xd0\xbd \xd0\xba\xd0\xbe\xd0\xb4 \xd0\xb4\xd0\xbe\xd1\x81\xd1\x82\xd1\x83\xd0\xbf\xd0\xb0 \xd0\xba \xd0\xb7\xd0\xb4\xd0\xb0\xd0\xbd\xd0\xb8\xd1\x8e [\xd0\xa3\xd0\x94\xd0\x90\xd0\x9b\xd0\x95\xd0\x9d\xd0\x9e]:\nOVK8YLX / (\xd0\x98\xd0\x9c\xd0\xaf / \xd0\x9a\xd0\x9b\xd0\xae\xd0\xa7)\n================================================== ======\n\xd0\x91\xd0\xb0\xd0\xb7\xd0\xb0 36'

Using a different decoder online, I learned this contains Russian characters. Is there any relatively simple way to have my program check if a decoded base 64 string contains non-ascii characters, and then translates it as such?

Geode890
  • 15
  • 4
  • The question is somewhat misleading, as the core of the queston is not about Base64 at all. It's more about determining the encoding of a text string where the fact it was decoded from Base64 or comes from elsewhere is irrelevant. – JohnLM Aug 07 '18 at 06:59
  • @JohnLM Thanks for letting me know. I wasn't entirely sure if this was a base 64 exclusive question, as I wasn't quit sure what to make of the output. – Geode890 Aug 08 '18 at 02:54

1 Answers1

1

In your particular case the string is UTF-8 encoded.

In Python 3.x you have to decode it from bytes to str, assuming the decoded bytes are in x:

>>> x.decode('utf-8')
'Ниже приведен код доступа к зданию [УДАЛЕНО]:\nOVK8YLX / (ИМЯ / КЛЮЧ)\n================================================== ======\nБаза 36'

However in general case, you can only guess the encoding. See this and related questions.

JohnLM
  • 195
  • 3
  • 8