0

I have a file that I want to convert into custom base (base 86 for example, with custom alphabet)

I have try to convert the file with hexlify and then into my custom base but it's too slow... 8 second for 60 Ko..

def HexToBase(Hexa, AlphabetList, OccurList, threshold=10):
    number = int(Hexa,16) #base 16 vers base 10
    alphabet = GetAlphabet(AlphabetList, OccurList, threshold)
    #GetAlphabet return a list of all chars that occurs more than threshold times

    b_nbr = len(alphabet) #get the base
    out = ''
    while number > 0:
        out = alphabet[(number % b_nbr)] + out
        number = number // b_nbr
    return out

file = open("File.jpg","rb")
binary_data = file.read()
HexToBase(binascii.hexlify(binary_data),['a','b'],[23,54])

So, could anyone help me to find the right solution ?

Sorry for my poor English I'm French, and Thank's for your help !

Mathix420
  • 872
  • 11
  • 21
  • When you say _"from binary to custom base"_ you mean convert each byte in the source file to an arbitrary base, or taking into the account the whole file value into an arbitrary base (and if so what's the byte order, i.e. big endian, little endian, custom encoding...)? – zwer May 16 '18 at 10:05
  • If you do not show your current code, and the problems it has, we cannot help you to improve it... – Serge Ballesta May 16 '18 at 10:13
  • I want the same output as hexlify, so i think taking into the account the whole file value into an arbitrary base. – Mathix420 May 16 '18 at 10:15
  • Byte order is normally Little endian – Mathix420 May 16 '18 at 10:28

1 Answers1

2

First you can replace:

int(binascii.hexlify(binary_data), 16) # timeit: 14.349809918712538

By:

int.from_bytes(binary_data, byteorder='little') # timeit: 3.3330371951720164

Second you can use the divmod function to speed up the loop:

out = ""
while number > 0:
    number, m = divmod(number, b_nbr)
    out = alphabet[m] + out

# timeit: 3.8345545611298126 vs 7.472579440019706

For divmod vs %, // comparison and large numbers, see Is divmod() faster than using the % and // operators?.

(Remark: I expected that buildind an array and then making a string with "".join would be faster than out = ... + out but that was not the case with CPython 3.6.)

Everything put together gave me a speed up factor of 6.

jferard
  • 7,835
  • 2
  • 22
  • 35
  • Thanks that's really helped me, I have changed all my code with your tips and now I can encrypt a 60 Ko files with custom base and custom cypher in less than a second ! if you're curious > https://pypi.org/project/NimingCypher/ – Mathix420 May 23 '18 at 13:29