10

I need to create a function called compress that compresses a string by replacing any repeated letters with a letter and number. My function should return the shortened version of the string. I've been able to count the first character but not any others.

Ex:

>>> compress("ddaaaff")
'd2a3f2'


 def compress(s):
     count=0

     for i in range(0,len(s)):
         if s[i] == s[i-1]:
             count += 1
         c = s.count(s[i])

     return str(s[i]) + str(c)
Cero
  • 205
  • 1
  • 3
  • 10

23 Answers23

19

Here is a short python implementation of a compression function:

def compress(string):

    res = ""

    count = 1

    #Add in first character
    res += string[0]

    #Iterate through loop, skipping last one
    for i in range(len(string)-1):
        if(string[i] == string[i+1]):
            count+=1
        else:
            if(count > 1):
                #Ignore if no repeats
                res += str(count)
            res += string[i+1]
            count = 1
    #print last one
    if(count > 1):
        res += str(count)
    return res

Here are a few examples:

>>> compress("ddaaaff")
'd2a3f2'
>>> compress("daaaafffyy")
'da4f3y2'
>>> compress("mississippi")
'mis2is2ip2i'
Patrick Yu
  • 972
  • 1
  • 7
  • 19
  • Thank you so much, the second step is where I was confused. – Cero Sep 30 '15 at 03:13
  • Glad to hear that :-) – Patrick Yu Sep 30 '15 at 03:21
  • Is there a way to do the reverse procedure? – Cero Sep 30 '15 at 03:22
  • Do you mean by "reverse procedure" as to uncompress text? If that's what you want, you can just iterate over the compressed string, and everytime you hit an integer _n_, you can print the last character _n_ times. – Patrick Yu Sep 30 '15 at 03:27
  • Yes, what would the notation be for checking the integer in a string? – Cero Sep 30 '15 at 03:32
  • You can use `isinstance(n, int )`, which returns True if _n_ is an integer. Reference: http://stackoverflow.com/questions/3501382/checking-whether-a-variable-is-an-integer-or-not – Patrick Yu Sep 30 '15 at 03:40
  • Sorry I keep bothering you but can you provide an example of use. I'm a newb :(, and lack knowledge of proper usage of methods. – Cero Sep 30 '15 at 03:50
  • Here is my code for uncompression: http://ideone.com/x6BWX0 Note that I used a different integer checking method instead of isinstance, because I just found that you can't check for ints inside strings. So, I adapted my integer check code from http://stackoverflow.com/questions/1265665/python-check-if-a-string-represents-an-int-without-using-try-except. Also, try printing different uncompresses, like uncompress("mis2is2ip2i"). – Patrick Yu Sep 30 '15 at 04:36
9

Short version with generators:

from itertools import groupby
import re
def compress(string):
    return re.sub(r'(?<![0-9])[1](?![0-9])', '', ''.join('%s%s' % (char, sum(1 for _ in group)) for char, group in groupby(string)))

(1) Grouping by chars with groupby(string)

(2) Counting length of group with sum(1 for _ in group) (because no len on group is possible)

(3) Joining into proper format

(4) Removing 1 chars for single items when there is a no digit before and after 1

Jagrut Trivedi
  • 1,271
  • 14
  • 18
4

There are several reasons why this doesn't work. You really need to try debugging this yourself first. Put in a few print statements to trace the execution. For instance:

def compress(s):
    count=0

    for i in range(0, len(s)):
        print "Checking character", i, s[i]
        if s[i] == s[i-1]:
            count += 1
        c = s.count(s[i])
        print "Found", s[i], c, "times"

    return str(s[i]) + str(c)

print compress("ddaaaff")

Here's the output:

Checking character 0 d
Found d 2 times
Checking character 1 d
Found d 2 times
Checking character 2 a
Found a 3 times
Checking character 3 a
Found a 3 times
Checking character 4 a
Found a 3 times
Checking character 5 f
Found f 2 times
Checking character 6 f
Found f 2 times
f2

Process finished with exit code 0

(1) You throw away the results of all but the last letter's search. (2) You count all occurrences, not merely the consecutive ones. (3) You cast a string to a string -- redundant.

Try working through this example with pencil and paper. Write down the steps you use, as a human being, to parse the string. Work on translating those to Python.

Prune
  • 76,765
  • 14
  • 60
  • 81
  • 1
    How would I count all occurrences? – Cero Sep 30 '15 at 01:26
  • 1
    You *did* count all occurrences with the "count" function. Reworking the algorithm from here is your job; we'll help with code that's already written. – Prune Sep 30 '15 at 01:28
  • 1
    What would I do once I got the result of the first character? – Cero Sep 30 '15 at 01:31
  • 1
    Did you work through this with pencil & paper? What did you do there? [These are leading questions; give it a try.] – Prune Sep 30 '15 at 01:33
  • 1
    More specifically, *how* did you go about turning "ddaaaff" into "d2a3f2" when you worked on paper? Note the areas of the paper where you kept partial results: those are your variables. Write down the steps you used, and where you repeated things (loops). – Prune Sep 30 '15 at 01:45
  • 1
    I'm done, your no help. going through the process only made me less sure on what my translation to python should be. Frustration beyond belief, thanks though for your attempt. – Cero Sep 30 '15 at 01:53
  • 1
    Ah! You have something to translate into Python? You have something that works in another language or application? That would help a lot -- can you post it? – Prune Sep 30 '15 at 02:02
1
x="mississippi"
res = ""
count = 0
while (len(x) > 0):
    count = 1
    res= ""
    for j in range(1, len(x)):
        if x[0]==x[j]:
            count= count + 1
        else:
            res = res + x[j]
    print(x[0], count, end=" ")
    x=res
1

Just another simplest way to perform this:

def compress(str1):
    output = ''
    initial = str1[0]
    output = output + initial
    count = 1
    for item in str1[1:]:
        if item == initial:
            count = count + 1
        else:
            if count == 1:
                count = ''
            output = output + str(count)
            count = 1
            initial = item
            output = output + item
    print (output)

Which gives the output as required, examples:

>> compress("aaaaaaaccddddeehhyiiiuuo")
a7c2d4e2h2yi3u2o

>> compress("lllhhjuuuirrdtt")
l3h2ju3ir2dt

>> compress("mississippi")
mis2is2ip2i
Viv
  • 185
  • 1
  • 1
  • 8
1
from collections import Counter
def string_compression(string):
    counter = Counter(string)
    result = ''
    for k, v in counter.items():
        result = result + k + str(v)
    print(result)
  • 3
    DO NOT ANSWER CODE ONLY. Describe what you changed. Describe how it effects the outcome and what it improves. That greatly improves you answers quality ^^ – finnmglas May 13 '20 at 04:45
0
input = "mississippi"
count = 1
for i in range(1, len(input) + 1):
    if i == len(input):
        print(input[i - 1] + str(count), end="")
        break
    else:
        if input[i - 1] == input[i]:
            count += 1
    else:
            print(input[i - 1] + str(count), end="")
            count = 1

Output : m1i1s2i1s2i1p2i1

tanz
  • 1
0
s=input("Enter the string:")
temp={}
result=" "
for x in s:
    if x in temp:
        temp[x]=temp[x]+1
    else:
        temp[x]=1
for key,value in temp.items():
    result+=str(key)+str(value)

print(result)

0

Here is something I wrote.

def stringCompression(str1):
  counter=0
  prevChar = str1[0]
  str2=""
  charChanged = False
  loopCounter = 0

  for char in str1:
      if(char==prevChar):
          counter+=1
          charChanged = False
      else:
          str2 += prevChar + str(counter)
          counter=1
          prevChar = char
          if(loopCounter == len(str1) - 1):
              str2 += prevChar + str(counter)
          charChanged = True
      loopCounter+=1
  if(not charChanged):
      str2+= prevChar + str(counter)

  return str2

Not the best code I guess. But works well.

a -> a1

aaabbbccc -> a3b3c3

vramazing
  • 94
  • 1
  • 1
  • 6
0

This is a solution to the problem. But keep in mind that this method only effectively works if there's a lot of repetition, specifically if consecutive characters are repetitive. Otherwise, it will only worsen the situation.

e.g.,
AABCD --> A2B1C1D1
BcDG ---> B1c1D1G1

def compress_string(s):
    result = [""] * len(s)
    visited = None

    index = 0
    count = 1

    for c in s:
        if c == visited:
            count += 1
            result[index] = f"{c}{count}"
        else:
            count = 1
            index += 1
            result[index] = f"{c}{count}"
            visited = c

    return "".join(result)
ishmam
  • 96
  • 2
0

You can simply achieve that by:

gstr="aaabbccccdddee"
last=gstr[0]
count=0
rstr=""
for i in gstr:
    if i==last:
        count=count+1
    elif i!=last:
        rstr=rstr+last+str(count)
        count=1
        last=i
rstr=rstr+last+str(count)
print ("Required string for given string {} after conversion is {}.".format(gstr,rstr))
Pedram Parsian
  • 3,750
  • 3
  • 19
  • 34
0

Here is a short python implementation of a compression function:

#d=compress('xxcccdex')
#print(d)

def compress(word):
    list1=[]
    for i in range(len(word)):
        list1.append(word[i].lower())
    num=0
    dict1={}
    for i in range(len(list1)):
        if(list1[i] in list(dict1.keys())):
            dict1[list1[i]]=dict1[list1[i]]+1
        else:
            dict1[list1[i]]=1

    s=list(dict1.keys())
    v=list(dict1.values())
    word=''
    for i in range(len(s)):
        word=word+s[i]+str(v[i])
    return word
Julian
  • 33,915
  • 22
  • 119
  • 174
0

Below logic will work irrespective of

  1. Data structure
  2. Group By OR Set or any sort of compression logic
  3. Capital or non-capital characters
  4. Character repeat if not sequential

    def fstrComp_1(stng):
    sRes = ""
    cont = 1        
    for i in range(len(stng)):
    
     if not stng[i] in sRes:
        stng = stng.lower()
        n = stng.count(stng[i])
        if  n > 1: 
            cont = n
            sRes += stng[i] + str(cont)
        else:
            sRes += stng[i]
    
        print(sRes)
    
    fstrComp_1("aB*b?cC&")
    
user4331
  • 1
  • 2
0

I wanted to do it by partitioning the string. So aabbcc would become: ['aa', 'bb', 'cc']

This is how I did it:

def compression(string):

    # Creating a partitioned list
    alist = list(string)
    master = []
    n = len(alist)

    for i in range(n):
        if alist[i] == alist[i-1]:
            master[-1] += alist[i]
        else:
            master += alist[i]


    # Adding the partitions together in a new string
    newString = "" 
    for i in master:
        newString += i[0] + str(len(i))

    # If the newString is longer than the old string, return old string (you've not 
    # compressed it in length)
    if len(newString) > n:
        return string
    return newString



string = 'aabbcc'
print(compression(string))
Xavier Chia
  • 203
  • 3
  • 5
0

string = 'aabccccd' output = '2a3b4c4d'

new_string = " "
count = 1
for i in range(len(string)-1):
    if string[i] == string[i+1]:
        count = count + 1
    else:         
        new_string =  new_string + str(count) + string[i]
        count = 1 
new_string = new_string + str(count) + string[i+1]    
print(new_string)
AutomationTest
  • 101
  • 1
  • 2
  • 13
0

For a coding interview, where it was about the algorithm, and not about my knowledge of Python, its internal representation of data structures, or the time complexity of operations such as string concatenation:

def compress(message: str) -> str:
    output = ""
    length = 0
    previous: str = None
    for char in message:
        if previous is None or char == previous:
            length += 1
        else:
            output += previous
            if length > 1:
                output += str(length)
            length = 1
        previous = char
    if previous is not None:
        output += previous
        if length > 1:
            output += str(length)
    return output

For code I'd actually use in production, not reinventing any wheels, being more testable, using iterators until the last step for space efficiency, and using join() instead of string concatenation for time efficiency:

from itertools import groupby
from typing import Iterator


def compressed_groups(message: str) -> Iterator[str]:
    for char, group in groupby(message):
        length = sum(1 for _ in group)
        yield char + (str(length) if length > 1 else "")


def compress(message: str) -> str:
    return "".join(compressed_groups(message))

Taking things a step further, for even more testability:

from itertools import groupby
from typing import Iterator
from collections import namedtuple


class Segment(namedtuple('Segment', ['char', 'length'])):

    def __str__(self) -> str:
        return self.char + (str(self.length) if self.length > 1 else "")


def segments(message: str) -> Iterator[Segment]:
    for char, group in groupby(message):
        yield Segment(char, sum(1 for _ in group))


def compress(message: str) -> str:
    return "".join(str(s) for s in segments(message))

Going all-out and providing a Value Object CompressedString:

from itertools import groupby
from typing import Iterator
from collections import namedtuple


class Segment(namedtuple('Segment', ['char', 'length'])):

    def __str__(self) -> str:
        return self.char + (str(self.length) if self.length > 1 else "")


class CompressedString(str):

    @classmethod
    def compress(cls, message: str) -> "CompressedString":
        return cls("".join(str(s) for s in cls._segments(message)))

    @staticmethod
    def _segments(message: str) -> Iterator[Segment]:
        for char, group in groupby(message):
            yield Segment(char, sum(1 for _ in group))


def compress(message: str) -> str:
    return CompressedString.compress(message)
tobych
  • 2,941
  • 29
  • 18
0
def compress(val):
    print(len(val))
    end=0
    count=1
    result=""
    for i in range(0,len(val)-1):
        #print(val[i],val[i+1])
        if val[i]==val[i+1]:
            count=count+1
            #print(count,val[i])
        elif val[i]!=val[i+1]:
            #print(end,i)
            result=result+val[end]+str(count)
            end=i+1
            count=1
    result=result+val[-1]+str(count)
    return result
res=compress("I need to create a function called compress that compresses a string by replacing any repeated letters with a letter and number. My function should return the shortened version of the string. I've been able to count the first character but not any others.")
print(len(res))
user222213
  • 111
  • 1
  • 2
  • 12
0

Use python's standard library re.

def compress(string):
    import re
    p=r'(\w+?)\1+' # non greedy, group1 1
    sub_str=string
    for m in re.finditer(p,string):
        num=m[0].count(m[1])
        sub_str=re.sub(m[0],f'{m[1]}{num}',sub_str)
    return sub_str
string='aaaaaaaabbbbbbbbbcccccccckkkkkkkkkkkppp'
string2='ababcdcd'
string3='abcdabcd' 
string4='ababcdabcdefabcdcd' 

print(compress(string))
print(compress(string2))
print(compress(string3))
print(compress(string4))

Resut:

a8b9c8k11p3                                                                     
ab2cd2                                                                          
abcd2
ab2cdabcdefabcd2 
harryghgim
  • 1,340
  • 4
  • 12
  • 33
0

This is the modification of Patrick Yu's code. It code fails for the below test cases.

SAMPLE INPUT:
c
aaaaaaaaaabcdefgh

EXPECTED OUTPUT:
c1
a10b1c1d1e1f1g1h1

OUPUT OF Patrick's Code:
c
a10bcdefgh

Below is the modified code:


def Compress(S):
    Ans = S[0]
    count = 1
    for i in range(len(S)-1):
        if S[i] == S[i+1]:
            count += 1
        else:
            if count >= 1:
                Ans += str(count)
            Ans += S[i+1]
            count = 1
    if count>=1:
        Ans += str(count)
    return Ans

Just the condition must be changed from greater(">") to greater than equal to(">=") when comparing the count with 1.

0

Using generators:

input = "aaaddddffwwqqaattttttteeeeeee"

from itertools import groupby
print(''.join(([char+str(len(list(group))) for char, group in groupby(input)])))
Tomerikoo
  • 18,379
  • 16
  • 47
  • 61
0
def compress(string):

    # taking out unique characters from the string

    unique_chars = []
    for c in string:
        if not c in unique_chars:
            unique_chars.append(c)

    # Now count the characters

    res = ""

    for i in range(len(unique_chars)):
        count = string.count(unique_chars[i])
        res += unique_chars[i]+str(count)

    return res


string = 'aabccccd'
compress(string)
Mazhar
  • 1
0
from collections import Counter

def char_count(input_str):
    my_dict = Counter(input_str)
    print(my_dict)
    output_str = ""
    for i in input_str:
        if i not in output_str:
            output_str += i
            output_str += str(my_dict[i])
    return output_str

result = char_count("zddaaaffccc")
print(result)
Tomerikoo
  • 18,379
  • 16
  • 47
  • 61
-1
str_input = 'aabbccabca'
output = 'a2b2c2a1b1c1a1'
temp = str_input[0]
new_str = ''
tmp_dict = {}
for i in list(str_input):
    if temp == i:
        if i in tmp_dict.keys():
            tmp_dict[i]=tmp_dict[i]+1
        else:
            tmp_dict.update({i:1})
    else:
        for key in tmp_dict:
            new_str+=key+str(tmp_dict[key])
        tmp_dict = {}
        tmp_dict.update({i:1})
        temp = i
for key in tmp_dict:
    new_str+=key+str(tmp_dict[key])
print(new_str)
  • This question already has dozens of answers. Dumping code without any explanation isn't very helpful at this point. How does this improve upon what is already here? Why should we use this instead of one of the other answers? – ChrisGPT was on strike Apr 16 '23 at 19:36