2

-------------------------- add new-----------------------------

Let me fill more info here:

The actual situation is that I have this LONG STRING in environment-A, and I need to copy and paste it to environment-B;

UNFORTUNATELY, envir-A and envir-B are not connected (no mutual access), so I'm thinking about a way to encode/decode to represent it, otherwise for more files I have to input the string hand by hand----which is slow and not reproducible.

Any suggestion or gadget recommend? Many thanks!


I'm facing a weird problem to encode a SUPER LONG binaries to a simple form, like several digits.

Say, there is a long string consist of only 1 and 0, e.g. "110...011" of length 1,000 to 100,000 or even more digits, and I would like to encode this STRING to something that has fewer digits/chars. Then I need to reverse it back to original STRING.

Currently I'am trying using hex / int method in Python to 'compress' this String, and 'decompress' it back to original form.

A example would be:

1.input string : '110011110110011'

'''

def Bi_to_Hex_Int(input_str, method ):

#2to16 
if method=='hex':
    string= str(input_str)
    input_two= string
    result=    hex(int(input_two,2))
    
#2to10 
if method=='int':
    string= str(input_str)
    input_two= string
    result=     int(input_two,2) 


print("input_bi length",len(str(input_two)), "\n output hex length",len(str(result)),'\n method: {}'.format(method) )
return result


res_16 =Bi_to_Hex_Int(gene , 'hex')
=='0x67b3'

res_10 =Bi_to_Hex_Int(gene , 'int')
== 26547

'''

Then I can reverse it back:

'''

def HexInt_to_bi(input_str , method):


if method =='hex':

    back_two =  bin(int(input_str,16))

    back_two =  back_two[2:]
    
     
if method =='int':

    back_two =  bin( int(input_str ))

    back_two =  back_two[2:]
    
    
print("input_hex length",len(str(input_str)), "\n output bi length",len(str(back_two)) )
return back_two


hexback_two = HexInt_to_bi(res_16, 'hex')
intback_two = HexInt_to_bi(res_10 , 'int')

'''

BUT, this does have a problem, I tried around 500 digits of String:101010...0001(500d), the best 'compressed' result is around 127 digits by hex;

So is there a better way to further 'compress' string to fewer digits?

**Say 5,000 digits of string consist of 1s&0s, compress to 50/100 something of digits/chars(even lower) ** ??

leveygao
  • 83
  • 8
  • Are you saying that you need to manually enter the data regardless? So you want to compress it to have less to type? – Mark Adler Jan 14 '21 at 02:44
  • sadly yes....basically that's the situation for now... get code from A and manully type into B. – leveygao Jan 14 '21 at 06:01
  • There is no way a human will be able to type thousands of characters correctly. At least not on the first try. You will need to break it up into smaller pieces, and do error detection on each piece. – Mark Adler Jan 14 '21 at 06:30

4 Answers4

2

If you want it that simple, say 1 hex character compresses 4 binary characters (2 ^ 4 = 16). Compression ratio you want is about 100 / 50 times. For 50 times you need 50 binary characters to be compressed into 1 character, means you require 2 ^ 50 different characters to encode any combination. Quite a lot that is.

If you accept lower ratio, you may try base64 like described here. Its compress ratio is 6 to 1.

Otherwise you have to come up with some complex algorithm like splitting your string into blocks, looking for similar amongst them, encoding them with different symbols, building a map of those symbols, etc.

Probably it's easier to compress your string with an archivator, then return a base64 representation of the result.

If task allows, you may store the whole strings somewhere and give them short unique names, so instead of compression and decompression you have to store and retrieve strings by names.

Som-1
  • 601
  • 7
  • 16
1

This probably doesn't produce the absolutely shortest string you can get, but it's trivially easy using the facilities built into Python. No need to convert the characters into a binary format, the zlib compression will convert an input with only 2 different characters into something optimal.

Encoding:

import zlib
import base64
result = base64.b64encode(zlib.compress(input_str.encode()))
Mark Ransom
  • 299,747
  • 42
  • 398
  • 622
1

If the count of 0 and 1 is significant different than you can use enumerative coding to get shortest representation

TTho Einthausend
  • 609
  • 4
  • 13
0

If the string consists only of 0 and 1 digits, then you can pack eight digits into one byte. You will also need to keep track of how many digits there are past the last multiple of eight, since the last byte may be representing fewer than eight digits.

Mark Adler
  • 101,978
  • 13
  • 118
  • 158