-2

I'm working on a simple compression algorithm that compresses binary files. I am scanning the file and filling a list with the character & the number of times that character appears after it. The list however is formatted in a way that is making the compressed result larger due to all the brackets & commas and I need rid of these. I have tried several methods of removing them but nothing is working. Here is the encode algorithm:

def encode(inputString):
characterCount = 1
previousCharacter = '' 
List = []
for character in inputString: 
    if character != previousCharacter:
        if previousCharacter:
            listEntry = (previousCharacter, characterCount) 
            List.append(listEntry) 
            #print lst
        characterCount = 1 
        previousCharacter = character
    else: 
        characterCount += 1
else: 
    try: 
        listEntry = (character, characterCount)
        List.append(listEntry)
        return (List, 0)
    except Exception as e:
        print("Exception encountered {e}".format(e=e)) 
        return (e, 1)`

And here is where I print the list. The hashed comments are the methods I have already tried with no luck.

value = encode(binaryfile)
if value[1] == 0:
print(value[0])   

#flattened = [val for sublist in value for val in sublist]
#print(flattened)
#values = value[0]

#print(*value[0], sep='')
#print (''.join(map(str, value)))
#print(int("".join(str(x) for x in value[0])))

And here is the output.

[('1', 2), ('0', 1), ('1', 1), ('0', 4), ('1', 2), ('0', 2), ('1', 4), ('0', 3), ('1', 1), ('0', 3), ('1', 4), ('0', 5), ('1', 1), ('0', 1), ('1', 1), ('0', 4), ('1', 2), ('0', 1), ('1', 2), ('0', 3), ('1', 1), ('0', 3), ('1', 2), ('0', 1), ('1', 1), ('0', 1), ('1', 3), ('0', 4), ('1', 1), ('0', 130), ('1', 5), ('0', 15), ('1', 2), ('0', 8), ('1', 7), ('0', 1), ('1', 8), ('0', 4), ('1', 1), ('0', 2), ('1', 1), ('0', 13), ('1', 2), ('0', 96), ('1', 1), ('0', 26), ('1', 3), ('0', 70), ('1', 1), ('0', 22), ('1', 3), ('0', 1), ('1', 1), ('0', 32), ('1', 1), ('0', 24), ('1', 7), ('0', 1), ('1', 24), ('0', 34), ('1', 2), ('0', 1), ('1', 3), ('0', 24), ('1', 3459), ('0', 1), ('1', 2), ('0', 2), ('1', 1), ('0', 1), ('1', 1), ('0', 2), ('1', 1), ('0', 1), ('1', 3), ('0', 5), ('1', 1), ('0', 10), ('1', 1), ('0', 2), ('1', 3), ('0', 1), ('1', 2), ('0', 9), ('1', 1), ('0', 2), ('1', 1), ('0', 5), ('1', 1), ('0', 18), ('1', 4), ('0', 7), ('1', 1), ('0', 2), ('1', 1), ('0', 1), ('1', 1),

Any help is greatly appreciated

  • duplicate?...https://stackoverflow.com/questions/17757450/how-to-print-a-list-with-integers-without-the-brackets-commas-and-no-quotes/17757544 and https://stackoverflow.com/questions/11178061/print-list-without-brackets-in-a-single-row – n8-da-gr8 Dec 04 '18 at 17:10

2 Answers2

1

So you're trying to get 1201110412 and so on? From your list of tuples you can use itertools.chain:

from itertools import chain

value = [('1', 2), ('0', 1), ('1', 1), ('0', 4), ('1', 2), ('0', 2), ('1', 4), ('0', 3), ('1', 1), ('0', 3), ('1', 4), ('0', 5), ('1', 1), ('0', 1), ('1', 1), ('0', 4), ('1', 2), ('0', 1), ('1', 2), ('0', 3), ('1', 1), ('0', 3), ('1', 2), ('0', 1), ('1', 1), ('0', 1), ('1', 3), ('0', 4), ('1', 1), ('0', 130), ('1', 5), ('0', 15), ('1', 2), ('0', 8), ('1', 7), ('0', 1), ('1', 8), ('0', 4), ('1', 1), ('0', 2), ('1', 1), ('0', 13), ('1', 2), ('0', 96), ('1', 1), ('0', 26), ('1', 3), ('0', 70), ('1', 1), ('0', 22), ('1', 3), ('0', 1), ('1', 1), ('0', 32), ('1', 1), ('0', 24), ('1', 7), ('0', 1), ('1', 24), ('0', 34), ('1', 2), ('0', 1), ('1', 3), ('0', 24), ('1', 3459), ('0', 1), ('1', 2), ('0', 2), ('1', 1), ('0', 1), ('1', 1), ('0', 2), ('1', 1), ('0', 1), ('1', 3), ('0', 5), ('1', 1), ('0', 10), ('1', 1), ('0', 2), ('1', 3), ('0', 1), ('1', 2), ('0', 9), ('1', 1), ('0', 2), ('1', 1), ('0', 5), ('1', 1), ('0', 18), ('1', 4), ('0', 7), ('1', 1), ('0', 2), ('1', 1), ('0', 1), ('1', 1)]

print(''.join(map(str, chain.from_iterable(value))))
# 12011104120214031103140511011104120112031103120111011304110130150151208170118041102110131209611026130701102213011103211024170112403412011302413459011202110111021101130511010110213011209110211051101814071102110111

Or if you're starting from a string like 1101000011 you can use itertools.groupby

from itertools import groupby

inputString = '11010000110011110001000111100000101000011011000100011010111000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000111110000000000000001100000000111111101111111100001001000000000000011000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000111000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000111010000000000000000000000000000000010000000000000000000000001111111011111111111111111111111100000000000000000000000000000000001101110000000000000000000000001111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111110110010100101110000010000000000100111011000000000100100000100000000000000000011110000000100101'

print(''.join([k + str(sum(1 for _ in g)) for k, g in groupby(inputString)]))
# 12011104120214031103140511011104120112031103120111011304110130150151208170118041102110131209611026130701102213011103211024170112403412011302413459011202110111021101130511010110213011209110211051101814071102110111
Patrick Haugh
  • 59,226
  • 13
  • 88
  • 96
0

Sounds like you just want a string instead of a list then essentially. Just use a string instead of appending to a list.

Instead of

List = []
for character in inputString: 
    if character != previousCharacter:
        if previousCharacter:
            listEntry = (previousCharacter, characterCount) 
            List.append(listEntry) 

Use this

string = ''
for character in inputString: 
    if character != previousCharacter:
        if previousCharacter:
            string += previousCharacter + str(characterCount)

Alternatively, you can take your list and convert it to a string at the end, but it is better to just start with a string instead of making the list first.

''.join(x[0] + str(x[1]) for x in List)
Paritosh Singh
  • 6,034
  • 2
  • 14
  • 33