6

I have a more basic Run Length Encoding question compared to many of the questions about this topic that have already been answered. Essentially, I'm trying to take the string

string = 'aabccccaaa'

and have it return

a2b1c4a3

I thought that if I can manage to get all the information into a list like I have illustrated below, I would easily be able to return a2b1c4a3

test = [['a','a'], ['b'], ['c','c','c','c'], ['a','a','a']]

I came up with the following code so far, but was wondering if someone would be able to help me figure out how to make it create the output I illustrated above.

def string_compression():
    for i in xrange(len(string)):
        prev_item, current_item = string[i-1], string[i]
        print prev_item, current_item
        if prev_item == current_item:
            <HELP>

If anyone has any additional comments regarding more efficient ways to go about solving a question like this I am all ears!

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
g.humpkins
  • 331
  • 6
  • 15
  • Possible duplicate of [Run length encoding in Python](https://stackoverflow.com/questions/18948382/run-length-encoding-in-python) – pylang Jul 24 '18 at 15:20

3 Answers3

9

You can use itertools.groupby():

from itertools import groupby

grouped = [list(g) for k, g in groupby(string)]

This will produce your per-letter groups as a list of lists.

You can turn that into a RLE in one step:

rle = ''.join(['{}{}'.format(k, sum(1 for _ in g)) for k, g in groupby(string)])

Each k is the letter being grouped, each g an iterator producing N times the same letter; the sum(1 for _ in g) expression counts those in the most efficient way possible.

Demo:

>>> from itertools import groupby
>>> string = 'aabccccaaa'
>>> [list(g) for k, g in groupby(string)]
[['a', 'a'], ['b'], ['c', 'c', 'c', 'c'], ['a', 'a', 'a']]
>>> ''.join(['{}{}'.format(k, sum(1 for _ in g)) for k, g in groupby(string)])
'a2b1c4a3'
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • Thanks works perfectly! I understand that how the code [list(g) for k, g in groupby(string)] works, but I'm getting caught on the first part. What you're doing is saying I've got a string of length 0 and to that string I want to join the sum of the smaller lists within the larger lists. Where I'm lost is at the piece join(['{}{}'.format(k, sum(1 for _ in g)) and I was wondering if you could explain in a little greater detail how that works? – g.humpkins Aug 29 '14 at 19:38
  • 1
    @ADT: the empty string is the joiner, what is placed between the elements of the list produced. Try out `' - '.join(']['foo', 'bar', 'spam'])` for example and vary that joiner. Also try out what *just* the `['{}{}'.format(k, sum(1 for _ in g)) for k, g in groupby(string)]` list comprehension produces. – Martijn Pieters Aug 29 '14 at 20:19
1

Consider using the more_itertools.run_length tool.

Demo

import more_itertools as mit


iterable = "aabccccaaa"
list(mit.run_length.encode(iterable))
# [('a', 2), ('b', 1), ('c', 4), ('a', 3)]

Code

"".join(f"{x[0]}{x[1]}" for x in mit.run_length.encode(iterable))  # python 3.6
# 'a2b1c4a3'

"".join(x[0] + str(x[1]) for x in mit.run_length.encode(iterable))
# 'a2b1c4a3'

Alternative itertools/functional style:

"".join(map(str, it.chain.from_iterable(x for x in mit.run_length.encode(iterable))))
# 'a2b1c4a3'

Note: more_itertools is a third-party library that installable via pip install more_itertools.

pylang
  • 40,867
  • 14
  • 129
  • 121
0

I'm a Python beginner and this is what I wrote for RLE.

s = 'aabccccaaa'
grouped_d = [(k, len(list(g))) for k, g in groupby(s)]

result = ''
for key, count in grouped_d:
    result += key + str(count)

print(f'result = {result}')

user2761895
  • 1,431
  • 4
  • 22
  • 38