1

My question is similar, but different from the following:

How do I remove a substring from the end of a string in Python?

Suppose we have:

input = "baabbbbb_xx_ba_xxx_abbbbbba"

We want to want to keep everything except the ba at the end and ba at the beginning.

1) Direct strip() fails

strip treats the string as a set. That is, strip will remove the letters a and b appearing in any order. We want to only remove the characters ba if they appear in that exact order. Also, unlike strip, we want only zero or one copies removed from the end of the string. "x\n\n\n\n".strip() will remove many new-lines, not just one.

input = "baabbbbb_xx_ba_xxx_abbbbbba"
output = input.strip("ba")
print(output)
prints "_xx_ba_xxx_"

2) Direct replace() fails

input = "xx_ba_xxx"
output = input.replace("ba", "")
print(output)
# prints `xx__xxx`

Not cool; we only want to remove the sequence "ba" from the beginning and end of the string, not the middle.

3) Just nope

input = "baabbbbb_xx_ba_xxx_abbbbbba"
output = "ba".join(input.rsplit("ba", 1))
print(output)
# output==input

Final Note

The solution must be general: a function accepting any two input strings, once of which might not be "ba". The undesired leading and trailing strings might contain ".", "*" and other characters not nice for use in regular expressions.

Toothpick Anemone
  • 4,290
  • 2
  • 20
  • 42

2 Answers2

0

My solution uses basic hashing, however, be aware of hash collision. Let me know if this helps you with your problem.

import functools


def strip_ed(pattern, string):
    # pattern is not a substring of string
    if len(pattern) > len(string):
        return -1

    base = 26
    # Hash codes for the beginning of the string
    string_hash_beginning = functools.reduce(lambda h, c: h * base + ord(c), string[:len(pattern)], 0)
    # Hash codes for the ending of the string
    string_hash_end = functools.reduce(lambda h, c: h * base + ord(c), string[-len(pattern):], 0)
    # Hash codes for the pattern
    pattern_hash = functools.reduce(lambda h, c: h * base + ord(c), pattern, 0)
    while True:
        if string_hash_beginning == string_hash_end and \
                string_hash_beginning == pattern_hash and \
                string[:len(pattern)] == pattern:
            return string[len(pattern):-len(pattern)]
        elif string_hash_beginning == pattern_hash and string[:len(pattern)] == pattern:
            return string[len(pattern):]
        elif string_hash_end == pattern_hash and string[-len(pattern):] == pattern:
            return string[:-len(pattern)]
        else:
            return string
armrasec
  • 11
  • 4
-1

This seems to work:

def ordered_strip(whole, part):
    center = whole
    if whole.endswith(part):
        center = center[:-len(part)]
    if whole.startswith(part):
        center = center[len(part):]
    return center
Toothpick Anemone
  • 4,290
  • 2
  • 20
  • 42
  • You got one of your slices backward (and also you have a problem if `len(part) == 0`). – user2357112 Nov 02 '19 at 03:03
  • Anyway, this is pretty much just doing what you would to remove a substring from one end of a string, but doing it twice, so I'm not sure what value you were hoping to add here over the question you already linked. – user2357112 Nov 02 '19 at 03:05
  • 1
    `if whole.startswith(part): center = center[:-len(part)]` part should be `center = center[len(part):]`. – Austin Nov 02 '19 at 03:14
  • You've fixed the slice thing, but you still return `''` for `ordered_strip('asdf', '')`. Also, because you use `whole` instead of `center` for both starts/endswith checks, you return `''` for `ordered_strip('ababa', 'aba')`. Depending on what you're using this for, the second `''` result might be what you want, but it's different from how most Python string manipulation routines handle overlapping matches. (The first `''` result definitely isn't right.) – user2357112 Nov 02 '19 at 03:48