Create a compress function in Python?

Question

I need to create a function called compress that compresses a string by replacing any repeated letters with a letter and number. My function should return the shortened version of the string. I've been able to count the first character but not any others.

Ex:

>>> compress("ddaaaff")
'd2a3f2'


 def compress(s):
     count=0

     for i in range(0,len(s)):
         if s[i] == s[i-1]:
             count += 1
         c = s.count(s[i])

     return str(s[i]) + str(c)

What have you tried code wise? This looks pretty lazy without any effort put into it. — Jesse Glover, Sep 30 '15 at 00:30

score 19 · Accepted Answer · answered Sep 30 '15 at 02:42

19

Here is a short python implementation of a compression function:

def compress(string):

    res = ""

    count = 1

    #Add in first character
    res += string[0]

    #Iterate through loop, skipping last one
    for i in range(len(string)-1):
        if(string[i] == string[i+1]):
            count+=1
        else:
            if(count > 1):
                #Ignore if no repeats
                res += str(count)
            res += string[i+1]
            count = 1
    #print last one
    if(count > 1):
        res += str(count)
    return res

Here are a few examples:

>>> compress("ddaaaff")
'd2a3f2'
>>> compress("daaaafffyy")
'da4f3y2'
>>> compress("mississippi")
'mis2is2ip2i'

answered Sep 30 '15 at 02:42

Patrick Yu

972
1
7
19

Thank you so much, the second step is where I was confused. – Cero Sep 30 '15 at 03:13
Glad to hear that :-) – Patrick Yu Sep 30 '15 at 03:21
Is there a way to do the reverse procedure? – Cero Sep 30 '15 at 03:22
Do you mean by "reverse procedure" as to uncompress text? If that's what you want, you can just iterate over the compressed string, and everytime you hit an integer _n_, you can print the last character _n_ times. – Patrick Yu Sep 30 '15 at 03:27
Yes, what would the notation be for checking the integer in a string? – Cero Sep 30 '15 at 03:32
You can use `isinstance(n, int )`, which returns True if _n_ is an integer. Reference: http://stackoverflow.com/questions/3501382/checking-whether-a-variable-is-an-integer-or-not – Patrick Yu Sep 30 '15 at 03:40
Sorry I keep bothering you but can you provide an example of use. I'm a newb :(, and lack knowledge of proper usage of methods. – Cero Sep 30 '15 at 03:50
Here is my code for uncompression: http://ideone.com/x6BWX0 Note that I used a different integer checking method instead of isinstance, because I just found that you can't check for ints inside strings. So, I adapted my integer check code from http://stackoverflow.com/questions/1265665/python-check-if-a-string-represents-an-int-without-using-try-except. Also, try printing different uncompresses, like uncompress("mis2is2ip2i"). – Patrick Yu Sep 30 '15 at 04:36

score 9 · Answer 2 · edited May 12 '20 at 08:55

9

Short version with generators:

from itertools import groupby
import re
def compress(string):
    return re.sub(r'(?<![0-9])[1](?![0-9])', '', ''.join('%s%s' % (char, sum(1 for _ in group)) for char, group in groupby(string)))

(1) Grouping by chars with groupby(string)

(2) Counting length of group with sum(1 for _ in group) (because no len on group is possible)

(3) Joining into proper format

(4) Removing 1 chars for single items when there is a no digit before and after 1

edited May 12 '20 at 08:55

Jagrut Trivedi

1,271
14
18

answered Apr 05 '17 at 08:41

Alexandre Fedosov

141
4
5

This will not work with 11m character. Expected output should be m11 but it will give only m. – Jagrut Trivedi May 09 '20 at 13:13

score 4 · Answer 3 · answered Sep 30 '15 at 00:48

4

There are several reasons why this doesn't work. You really need to try debugging this yourself first. Put in a few print statements to trace the execution. For instance:

def compress(s):
    count=0

    for i in range(0, len(s)):
        print "Checking character", i, s[i]
        if s[i] == s[i-1]:
            count += 1
        c = s.count(s[i])
        print "Found", s[i], c, "times"

    return str(s[i]) + str(c)

print compress("ddaaaff")

Here's the output:

Checking character 0 d
Found d 2 times
Checking character 1 d
Found d 2 times
Checking character 2 a
Found a 3 times
Checking character 3 a
Found a 3 times
Checking character 4 a
Found a 3 times
Checking character 5 f
Found f 2 times
Checking character 6 f
Found f 2 times
f2

Process finished with exit code 0

(1) You throw away the results of all but the last letter's search. (2) You count all occurrences, not merely the consecutive ones. (3) You cast a string to a string -- redundant.

Try working through this example with pencil and paper. Write down the steps you use, as a human being, to parse the string. Work on translating those to Python.

answered Sep 30 '15 at 00:48

Prune

76,765
14
60
81

1

How would I count all occurrences? – Cero Sep 30 '15 at 01:26
1

You *did* count all occurrences with the "count" function. Reworking the algorithm from here is your job; we'll help with code that's already written. – Prune Sep 30 '15 at 01:28
1

What would I do once I got the result of the first character? – Cero Sep 30 '15 at 01:31
1

Did you work through this with pencil & paper? What did you do there? [These are leading questions; give it a try.] – Prune Sep 30 '15 at 01:33
1

More specifically, *how* did you go about turning "ddaaaff" into "d2a3f2" when you worked on paper? Note the areas of the paper where you kept partial results: those are your variables. Write down the steps you used, and where you repeated things (loops). – Prune Sep 30 '15 at 01:45
1

I'm done, your no help. going through the process only made me less sure on what my translation to python should be. Frustration beyond belief, thanks though for your attempt. – Cero Sep 30 '15 at 01:53
1

Ah! You have something to translate into Python? You have something that works in another language or application? That would help a lot -- can you post it? – Prune Sep 30 '15 at 02:02

score 1 · Answer 4 · answered Jun 06 '18 at 06:48

1

x="mississippi"
res = ""
count = 0
while (len(x) > 0):
    count = 1
    res= ""
    for j in range(1, len(x)):
        if x[0]==x[j]:
            count= count + 1
        else:
            res = res + x[j]
    print(x[0], count, end=" ")
    x=res

answered Jun 06 '18 at 06:48

Puneet Chaudhary

11
1

1

Output: m 1 i 4 s 4 p 2 – Puneet Chaudhary Jun 06 '18 at 06:48

score 1 · Answer 5 · answered Jan 02 '19 at 18:11

Just another simplest way to perform this:

def compress(str1):
    output = ''
    initial = str1[0]
    output = output + initial
    count = 1
    for item in str1[1:]:
        if item == initial:
            count = count + 1
        else:
            if count == 1:
                count = ''
            output = output + str(count)
            count = 1
            initial = item
            output = output + item
    print (output)

Which gives the output as required, examples:

>> compress("aaaaaaaccddddeehhyiiiuuo")
a7c2d4e2h2yi3u2o

>> compress("lllhhjuuuirrdtt")
l3h2ju3ir2dt

>> compress("mississippi")
mis2is2ip2i

score 1 · Answer 6 · answered May 13 '20 at 03:41

1

from collections import Counter
def string_compression(string):
    counter = Counter(string)
    result = ''
    for k, v in counter.items():
        result = result + k + str(v)
    print(result)

answered May 13 '20 at 03:41

Gaganpreet Sran

11
1

3

DO NOT ANSWER CODE ONLY. Describe what you changed. Describe how it effects the outcome and what it improves. That greatly improves you answers quality ^^ – finnmglas May 13 '20 at 04:45

score 0 · Answer 7 · answered Aug 14 '18 at 17:33

0

input = "mississippi"
count = 1
for i in range(1, len(input) + 1):
    if i == len(input):
        print(input[i - 1] + str(count), end="")
        break
    else:
        if input[i - 1] == input[i]:
            count += 1
    else:
            print(input[i - 1] + str(count), end="")
            count = 1

Output : m1i1s2i1s2i1p2i1

answered Aug 14 '18 at 17:33

tanz

1

1

I think the author wanted number for repeated character only, so it should "mis2is2p2i" – Baptiste Mille-Mathias Aug 14 '18 at 17:38

score 0 · Answer 8 · answered Sep 03 '18 at 17:22

0

s=input("Enter the string:")
temp={}
result=" "
for x in s:
    if x in temp:
        temp[x]=temp[x]+1
    else:
        temp[x]=1
for key,value in temp.items():
    result+=str(key)+str(value)

print(result)

answered Sep 03 '18 at 17:22

Syed Imad

1

indentation of the last line is wrong, so explanation for the person asking the question would be great. – Baptiste Mille-Mathias Sep 03 '18 at 17:33

score 0 · Answer 9 · answered Sep 23 '19 at 18:39

Here is something I wrote.

def stringCompression(str1):
  counter=0
  prevChar = str1[0]
  str2=""
  charChanged = False
  loopCounter = 0

  for char in str1:
      if(char==prevChar):
          counter+=1
          charChanged = False
      else:
          str2 += prevChar + str(counter)
          counter=1
          prevChar = char
          if(loopCounter == len(str1) - 1):
              str2 += prevChar + str(counter)
          charChanged = True
      loopCounter+=1
  if(not charChanged):
      str2+= prevChar + str(counter)

  return str2

Not the best code I guess. But works well.

a -> a1

aaabbbccc -> a3b3c3

score 0 · Answer 10 · answered Oct 14 '19 at 11:12

This is a solution to the problem. But keep in mind that this method only effectively works if there's a lot of repetition, specifically if consecutive characters are repetitive. Otherwise, it will only worsen the situation.

e.g.,
AABCD --> A2B1C1D1
BcDG ---> B1c1D1G1

def compress_string(s):
    result = [""] * len(s)
    visited = None

    index = 0
    count = 1

    for c in s:
        if c == visited:
            count += 1
            result[index] = f"{c}{count}"
        else:
            count = 1
            index += 1
            result[index] = f"{c}{count}"
            visited = c

    return "".join(result)

score 0 · Answer 11 · edited Dec 12 '19 at 10:06

0

You can simply achieve that by:

gstr="aaabbccccdddee"
last=gstr[0]
count=0
rstr=""
for i in gstr:
    if i==last:
        count=count+1
    elif i!=last:
        rstr=rstr+last+str(count)
        count=1
        last=i
rstr=rstr+last+str(count)
print ("Required string for given string {} after conversion is {}.".format(gstr,rstr))

edited Dec 12 '19 at 10:06

Pedram Parsian

3,750
3
19
34

answered Dec 12 '19 at 09:06

vikashkmr3188

1

Hi Vikas - Its recommended to add textual deails and code only answers are not highly appreciated on this forum. – www.hybriscx.com Dec 12 '19 at 09:31

score 0 · Answer 12 · edited Jan 04 '20 at 15:21

Here is a short python implementation of a compression function:

#d=compress('xxcccdex')
#print(d)

def compress(word):
    list1=[]
    for i in range(len(word)):
        list1.append(word[i].lower())
    num=0
    dict1={}
    for i in range(len(list1)):
        if(list1[i] in list(dict1.keys())):
            dict1[list1[i]]=dict1[list1[i]]+1
        else:
            dict1[list1[i]]=1

    s=list(dict1.keys())
    v=list(dict1.values())
    word=''
    for i in range(len(s)):
        word=word+s[i]+str(v[i])
    return word

score 0 · Answer 13 · answered Jan 25 '20 at 09:18

Below logic will work irrespective of

Data structure
Group By OR Set or any sort of compression logic
Capital or non-capital characters

Character repeat if not sequential

def fstrComp_1(stng):
sRes = ""
cont = 1        
for i in range(len(stng)):

 if not stng[i] in sRes:
    stng = stng.lower()
    n = stng.count(stng[i])
    if  n > 1: 
        cont = n
        sRes += stng[i] + str(cont)
    else:
        sRes += stng[i]

    print(sRes)

fstrComp_1("aB*b?cC&")

score 0 · Answer 14 · answered Mar 03 '20 at 03:21

I wanted to do it by partitioning the string. So aabbcc would become: ['aa', 'bb', 'cc']

This is how I did it:

def compression(string):

    # Creating a partitioned list
    alist = list(string)
    master = []
    n = len(alist)

    for i in range(n):
        if alist[i] == alist[i-1]:
            master[-1] += alist[i]
        else:
            master += alist[i]


    # Adding the partitions together in a new string
    newString = "" 
    for i in master:
        newString += i[0] + str(len(i))

    # If the newString is longer than the old string, return old string (you've not 
    # compressed it in length)
    if len(newString) > n:
        return string
    return newString



string = 'aabbcc'
print(compression(string))

score 0 · Answer 15 · answered Apr 30 '20 at 23:34

string = 'aabccccd' output = '2a3b4c4d'

new_string = " "
count = 1
for i in range(len(string)-1):
    if string[i] == string[i+1]:
        count = count + 1
    else:         
        new_string =  new_string + str(count) + string[i]
        count = 1 
new_string = new_string + str(count) + string[i+1]    
print(new_string)

tobych · Answer 16 · 2020-05-18T07:48:35.027

For a coding interview, where it was about the algorithm, and not about my knowledge of Python, its internal representation of data structures, or the time complexity of operations such as string concatenation:

def compress(message: str) -> str:
    output = ""
    length = 0
    previous: str = None
    for char in message:
        if previous is None or char == previous:
            length += 1
        else:
            output += previous
            if length > 1:
                output += str(length)
            length = 1
        previous = char
    if previous is not None:
        output += previous
        if length > 1:
            output += str(length)
    return output

For code I'd actually use in production, not reinventing any wheels, being more testable, using iterators until the last step for space efficiency, and using join() instead of string concatenation for time efficiency:

from itertools import groupby
from typing import Iterator


def compressed_groups(message: str) -> Iterator[str]:
    for char, group in groupby(message):
        length = sum(1 for _ in group)
        yield char + (str(length) if length > 1 else "")


def compress(message: str) -> str:
    return "".join(compressed_groups(message))

Taking things a step further, for even more testability:

from itertools import groupby
from typing import Iterator
from collections import namedtuple


class Segment(namedtuple('Segment', ['char', 'length'])):

    def __str__(self) -> str:
        return self.char + (str(self.length) if self.length > 1 else "")


def segments(message: str) -> Iterator[Segment]:
    for char, group in groupby(message):
        yield Segment(char, sum(1 for _ in group))


def compress(message: str) -> str:
    return "".join(str(s) for s in segments(message))

Going all-out and providing a Value Object CompressedString:

from itertools import groupby
from typing import Iterator
from collections import namedtuple


class Segment(namedtuple('Segment', ['char', 'length'])):

    def __str__(self) -> str:
        return self.char + (str(self.length) if self.length > 1 else "")


class CompressedString(str):

    @classmethod
    def compress(cls, message: str) -> "CompressedString":
        return cls("".join(str(s) for s in cls._segments(message)))

    @staticmethod
    def _segments(message: str) -> Iterator[Segment]:
        for char, group in groupby(message):
            yield Segment(char, sum(1 for _ in group))


def compress(message: str) -> str:
    return CompressedString.compress(message)

score 0 · Answer 17 · answered May 29 '20 at 02:40

def compress(val):
    print(len(val))
    end=0
    count=1
    result=""
    for i in range(0,len(val)-1):
        #print(val[i],val[i+1])
        if val[i]==val[i+1]:
            count=count+1
            #print(count,val[i])
        elif val[i]!=val[i+1]:
            #print(end,i)
            result=result+val[end]+str(count)
            end=i+1
            count=1
    result=result+val[-1]+str(count)
    return result
res=compress("I need to create a function called compress that compresses a string by replacing any repeated letters with a letter and number. My function should return the shortened version of the string. I've been able to count the first character but not any others.")
print(len(res))

harryghgim · Answer 18 · 2020-08-28T04:34:13.663

Use python's standard library re.

def compress(string):
    import re
    p=r'(\w+?)\1+' # non greedy, group1 1
    sub_str=string
    for m in re.finditer(p,string):
        num=m[0].count(m[1])
        sub_str=re.sub(m[0],f'{m[1]}{num}',sub_str)
    return sub_str

string='aaaaaaaabbbbbbbbbcccccccckkkkkkkkkkkppp'
string2='ababcdcd'
string3='abcdabcd' 
string4='ababcdabcdefabcdcd' 

print(compress(string))
print(compress(string2))
print(compress(string3))
print(compress(string4))

Resut:

a8b9c8k11p3                                                                     
ab2cd2                                                                          
abcd2
ab2cdabcdefabcd2

Suraj Vijay · Answer 19 · 2022-04-05T15:03:40.160

This is the modification of Patrick Yu's code. It code fails for the below test cases.

SAMPLE INPUT:
c
aaaaaaaaaabcdefgh

EXPECTED OUTPUT:
c1
a10b1c1d1e1f1g1h1

OUPUT OF Patrick's Code:
c
a10bcdefgh

Below is the modified code:


def Compress(S):
    Ans = S[0]
    count = 1
    for i in range(len(S)-1):
        if S[i] == S[i+1]:
            count += 1
        else:
            if count >= 1:
                Ans += str(count)
            Ans += S[i+1]
            count = 1
    if count>=1:
        Ans += str(count)
    return Ans

Just the condition must be changed from greater(">") to greater than equal to(">=") when comparing the count with 1.

score 0 · Answer 20 · edited Mar 03 '21 at 16:12

0

Using generators:

input = "aaaddddffwwqqaattttttteeeeeee"

from itertools import groupby
print(''.join(([char+str(len(list(group))) for char, group in groupby(input)])))

edited Mar 03 '21 at 16:12

Tomerikoo

18,379
16
47
61

answered Mar 03 '21 at 16:08

Malaya Raj Lenka

11
1

score 0 · Answer 21 · answered Mar 21 '21 at 22:23

def compress(string):

    # taking out unique characters from the string

    unique_chars = []
    for c in string:
        if not c in unique_chars:
            unique_chars.append(c)

    # Now count the characters

    res = ""

    for i in range(len(unique_chars)):
        count = string.count(unique_chars[i])
        res += unique_chars[i]+str(count)

    return res


string = 'aabccccd'
compress(string)

score 0 · Answer 22 · edited Aug 08 '21 at 18:36

0

from collections import Counter

def char_count(input_str):
    my_dict = Counter(input_str)
    print(my_dict)
    output_str = ""
    for i in input_str:
        if i not in output_str:
            output_str += i
            output_str += str(my_dict[i])
    return output_str

result = char_count("zddaaaffccc")
print(result)

edited Aug 08 '21 at 18:36

Tomerikoo

18,379
16
47
61

answered Aug 08 '21 at 18:02

srijitha

1

score -1 · Answer 23 · answered Apr 12 '23 at 22:24

-1

str_input = 'aabbccabca'
output = 'a2b2c2a1b1c1a1'
temp = str_input[0]
new_str = ''
tmp_dict = {}
for i in list(str_input):
    if temp == i:
        if i in tmp_dict.keys():
            tmp_dict[i]=tmp_dict[i]+1
        else:
            tmp_dict.update({i:1})
    else:
        for key in tmp_dict:
            new_str+=key+str(tmp_dict[key])
        tmp_dict = {}
        tmp_dict.update({i:1})
        temp = i
for key in tmp_dict:
    new_str+=key+str(tmp_dict[key])
print(new_str)

answered Apr 12 '23 at 22:24

Hemangi Santoki

1

This question already has dozens of answers. Dumping code without any explanation isn't very helpful at this point. How does this improve upon what is already here? Why should we use this instead of one of the other answers? – ChrisGPT was on strike Apr 16 '23 at 19:36

Create a compress function in Python?

23 Answers23

Linked

Related