1

when inputting a text into the definition run_length_encoder the repititive letters should be compressed for example, when aaabbac is inputted the output should be ['a','a',3,'b','b',2,'a','c'] but for my code isn't compressing.

def run_length_encoder(string):
#def compress(string):

    res = []

    count = 1

    #Add in first character
    res.append(string[0])

    #Iterate through loop, skipping last one
    for i in range(len(string)-1):
        if(string[i] == string[i+1]):
            count+=1
            res.append(string[i+1])
        else:
            if(count > 1):
                #Ignore if no repeats
                res.append(count)
            res.append(string[i+1])
            count = 1
    #print last one
    if(count > 1):
        res.append(str(count))
    return res

for example when abbbbaa is inputed,the output is supposed to be this ['a', 'b', 'b', 4, 'a', 'a', 2] instead i am getting this ['a', 'b', 'b', 'b', 'b', 4, 'a', 'a', '2']

  • 2
    Can you please explain a little more about the encoding method? For example why `'aaa'` is `['a','a',3]` instead of `['a',3]`? Also another way would be to hold a list of dictionaries, but that highly depends on how much repetition is in your strings – anishtain4 Jun 12 '18 at 02:05
  • You should not append the number on match, only increment the counter. – import random Jun 12 '18 at 02:09
  • if a letter is repeating it should be given twice with the number of times it is repeating.....for example aaa should be aa3 while a should just be a but aa should be aa2 – Anvesh Sunkara Jun 12 '18 at 10:25

5 Answers5

1

you could also do something like:

def run_length_encoder(str_):
    compressedString = ''
    countConsecutive = 0
    strLen = len(str_)
    for i in range(strLen):
        countConsecutive += 1
        if i + 1 >= strLen or str_[i] != str_[i + 1]:
            compressedString += '' + str_[i] + str(countConsecutive)
            countConsecutive = 0

    return compressedString

sample = 'aaabbac'
result = list(run_length_encoder(sample))
print(result)
Sudhir Bastakoti
  • 99,167
  • 15
  • 158
  • 162
  • if a letter is repeating it should be given twice with the number of times it is repeating.....for example aaa should be aa3 while a should just be a but aa should be aa2 – Anvesh Sunkara Jun 12 '18 at 10:34
1

Itertools loves you and wants you to be happy:

from itertools import chain, groupby

def run_length_encoder(src):
    return list(
        # chain.from_iterable flattens the series of tuples we make inside the
        # loop into a single list.
        chain.from_iterable(
            # groupby returns an iterable (item, group) where group is an
            # iterable that yields a copy of `item` as many times as that item
            # appears consecutively in the input. Therefore, if you take the
            # length of `group`, you get the run length of `item`. This
            # whole expression then returns a series of (letter, count)
            # tuples.
            (letter, len(list(group))) for letter, group in groupby(src)
        )
    )


print(run_length_encoder("aajjjjiiiiohhkkkkkkkkhkkkk"))
Kirk Strauser
  • 30,189
  • 5
  • 49
  • 65
0

Your logic needs fixing. Fixed edit to handle even and odd end cases.

def run_length_encoder(string):
#def compress(string):

    res = []
    count = 1
    if(len(string) == 1):
        res.append(string[0])
        res.append(count)
        return res
    else:
        current = string[0]

        for i in range(1, len(string)):

            if(string[i] == current):
                count+=1
            else:
                res.append(current)
                res.append(count)
                current = string[i]
                count = 1
            i+=1
            if(i == len(string)):
                res.append(current)
                res.append(count)
        return res

Tested on strings: string = "aaabbaadddaad" OUTPUT: ['a', 3, 'b', 2, 'a', 2, 'd', 3, 'a', 2, 'd', 1] string = "aaabbaaddd" OUTPUT: ['a', 3, 'b', 2, 'a', 2, 'd', 3] string = "aabccdd" OUTPUT: ['a', 2, 'b', 1, 'c', 2, 'd', 2]

Jesse
  • 1,814
  • 1
  • 21
  • 25
  • if a letter is repeating it should be given twice with the number of times it is repeating.....for example aaa should be aa3 while a should just be a but aa should be aa2 – Anvesh Sunkara Jun 12 '18 at 10:34
0

If you want it simple and clean, you can do as explained in this answer with little tweaking for list output

def occurrence(str_):
     result = []
     count = 1
     for i in range(1, len(str_)):
         if str_[i-1] == str_[i]:
             count += 1
         else:
             result.append(str_[i-1]) 
             if count > 1:  # to add the element twice if count more than one
                 result.extend([str_[i-1], count])
             count = 1
     result.append(str_[i])
     if count > 1:
         result.extend([str_[i], count])
     return result

Test

>>> string = 'aajjjjiiiiohhkkkkkkkkhkkkk'
>>> occurrence(string)
['a', 'a', 2, 'j', 'j', 4, 'i', 'i', 4, 'o', 'h', 'h', 2, 'k', 'k', 8, 'h', 'k', 4]
>>> string = 'aaabbac'
>>> occurrence(string)
['a', 'a', 3, 'b', 'b', 2, 'a', 'c']
Bijoy
  • 1,131
  • 1
  • 12
  • 23
  • if a letter is repeating it should be given twice with the number of times it is repeating.....for example aaa should be aa3 while a should just be a but aa should be aa2 – Anvesh Sunkara Jun 12 '18 at 10:34
0

A nice way to do this would be to use list comprehensions along with itertools. This can basically be achieved in the shortest number of lines of code as follows:

from itertools import groupby
string = 'aajjjjiiiiohhkkkkkkkkhkkkkaaabsbbbbssssssssssbbaa'
result = list(sum([(k,sum(1 for i in g)) for k,g in groupby(string)], ()))

This results in:

['a', 2, 'j', 4, 'i', 4, 'o', 1, 'h', 2, 'k', 8,
 'h', 1, 'k', 4, 'a', 3, 'b', 1, 's', 1, 'b', 4,
 's', 10, 'b', 2, 'a', 2]

You could use a function as follows:

def run_length_encoding(string):
    return list(sum([(k,sum(1 for i in g)) for k,g in groupby(string)], ()))

result = run_length_encoding('aabbbccccddddd')

Explanation:

  1. groupby(string) groups each character and the resultant grouper g is an iterable where we add 1 to all each iteration to retrieve the count of characters in the iterable. This returns tuples of ('a',2) ...
  2. list(sum(...., ())) flattens the list of tuples and converts them into a list. So [('a',2), ('b',4) ... ] becomes ['a',2,'b',4...] which is the required output.
Sudheesh Singanamalla
  • 2,283
  • 3
  • 19
  • 36