0

my task is to remove duplicate letters in a string

remove_repeat(msg):

example

  • remove_repeat("bookkeeper")'bokeper'
  • remove_repeat("aaabcaaddddeff")'abcadef'
  • remove_repeat("a")'a'

The code I have so far is

def remove_repeat(msg):

    removed=[]
    for i in msg:
        if i not in removed:
            removed.append(i)
    return removed

I don't know how to continue with this code. I am not allowed to use 'join' or 'set'. A hint given was using += but I don't know how to incorporate that here

Vincent Savard
  • 34,979
  • 10
  • 68
  • 73
Jessica
  • 139
  • 2
  • 9
  • Do you know what `+=` does with strings? – user2357112 Mar 07 '16 at 17:32
  • Crack open the interpreter and give it a try. `removed = '';removed += 'b';print(removed)`. – tdelaney Mar 07 '16 at 17:34
  • 1
    I really wish professors wouldn't give assignments like this... I feel like there have to be lots of more creative problems that can teach students control flow and logic without teaching them to do a problem the most inefficient way possible... – mgilson Mar 07 '16 at 17:36
  • @mgilson: It's most likely an intro to programming class, the point _is_ to reinvent the wheel. – Vincent Savard Mar 07 '16 at 17:37
  • @mgilson: Amusingly, in CPython, this isn't actually that inefficient. They special case repeated string concatenation, so it doesn't suffer from Schlemiel the Painter's Algorithm too badly. But doing the same thing in other interpreters ([including Cython](http://stackoverflow.com/q/35787022/364696)) will have serious problems. – ShadowRanger Mar 07 '16 at 17:39
  • @ShadowRanger -- You're still suffering from the `O(N)` lookup in seen characters at each iteration of the loop, which goes away if you can just use a `set` to store the seen characters (and really doesn't change the logic _at all_). . . – mgilson Mar 07 '16 at 17:41
  • @mgilson: Well, in the correct version of the code (which only removes consecutive repeats), you don't do a `O(n)` lookup. – ShadowRanger Mar 07 '16 at 17:46
  • Oh, I see -- It is only _consecutive_ repeats that should be removed. Yeah, that does make the problem more interesting. It's not clear in the wording of the problem that is what is desired -- Only in the examples. – mgilson Mar 07 '16 at 17:48

3 Answers3

3

If you're using a list to accumulate results, the only good way to turn it back into a str is with ''.join. But since this is a class assignment, you can be bad, and just do repeated str concatenation to avoid a list entirely:

def remove_repeat(msg):
    newmsg = ''
    for let in msg:
        if let not in newmsg:
            newmsg += let
    return newmsg

That follows your logic and the rules, but it still gets the logic wrong, since the goal is to remove consecutive repeats, not all duplicates of the same latter. To fix, you'd track only the last letter:

def remove_repeat(msg):
    newmsg = ''
    for let in msg:
        # Only append if last letter of the string to date differs from this letter
        if let != newmsg[-1:]:  # Use slice to avoid special casing first letter
            newmsg += let
    return newmsg

let != newmsg[-1:] can also be done as not newmsg.endswith(let), not sure if you're allowed to use str methods, so I stuck with the slice test.

Just for the record, if I were implementing this outside a class, and inputs might be large (and I had some strong need to optimize it, who knows why) I'd do:

 from operator import itemgetter
 from itertools import groupby

 def remove_repeat(msg):
     return ''.join(map(itemgetter(0), groupby(msg)))

but that's probably being a little too clever. groupby is grouping consecutive repeated letters into groups, map(itemgetter(0), keeps on the group key (the single letter of the repeated group), and ''.join stitches it all back together.

ShadowRanger
  • 143,180
  • 12
  • 188
  • 271
  • BTW, I changed the name to `newmsg` only because `removed` made me think it was where you put the stuff you dropped, not the stuff you kept, which felt rather misleading. – ShadowRanger Mar 07 '16 at 17:37
  • While this does solve OP's immediate issue, it doesn't seem like it solves the assignment. It fails with the first and second examples. Seems like the assignment is to group consecutive identical letters. – Vincent Savard Mar 07 '16 at 17:39
  • I get the logic behind this and it makes sense but it fails about half my tests. – Jessica Mar 07 '16 at 17:40
  • Returns the wrong result. OP actually wants to substitute series of repeated characters with a single character. Your code produces `'bokepr'`. The desired output is `'bokeper'`. In addition, this should be doable in O(n). – timgeb Mar 07 '16 at 17:40
  • @VincentSavard: Ah. Should have checked provided examples. Will update. – ShadowRanger Mar 07 '16 at 17:41
  • @timgeb, @VincentSavard, @Jessica: Added corrected version (which is in fact `O(n)`). – ShadowRanger Mar 07 '16 at 17:45
0

Note that both previous solutions remove all following copies of each letter, not just immediately following ones.

Instead,

def remove_repeat(msg):
    previous = None
    result = ''
    for ch in msg:
        if ch != previous:
            result += ch
            previous = ch
    return result
Hugh Bothwell
  • 55,315
  • 8
  • 84
  • 99
0

First of all your second example remove_repeat("aaabcaaddddeff") → 'abcadef' will not appear with your code. The second group of a's will still fail because 'a' is already in remove. You need to test each letter against the immediate predecessor in remove.

Secondly, += with strings is the same an append. The following lines are equivalent.

mystring += mychar
mystring = mystring + mychar

Start by initializing remove to the first entry in msg (since it cannot be a duplicate). Now loop over the rest of msg, testing each new character against the immediate predecessor in remove. If it does not match then

result += newchar

def remove_repeat(msg):
    previous = None
    result = ''
    for newchar in msg:
        if newchar != previous:
            result += newchar
            previous = newchar
    return result
sabbahillel
  • 4,357
  • 1
  • 19
  • 36