0

I have encountered a very odd behavior of built-in function lstrip.

I will explain with a few examples:

print 'BT_NAME_PREFIX=MUV'.lstrip('BT_NAME_PREFIX=') # UV
print 'BT_NAME_PREFIX=NUV'.lstrip('BT_NAME_PREFIX=') # UV
print 'BT_NAME_PREFIX=PUV'.lstrip('BT_NAME_PREFIX=') # UV
print 'BT_NAME_PREFIX=SUV'.lstrip('BT_NAME_PREFIX=') # SUV
print 'BT_NAME_PREFIX=mUV'.lstrip('BT_NAME_PREFIX=') # mUV

As you can see, the function trims one additional character sometimes.

I tried to model the problem, and noticed that it persisted if I:

  • Changed BT_NAME_PREFIX to BT_NAME_PREFIY
  • Changed BT_NAME_PREFIX to BT_NAME_PREFIZ
  • Changed BT_NAME_PREFIX to BT_NAME_PREF

Further attempts have made it even more weird:

print 'BT_NAME=MUV'.lstrip('BT_NAME=') # UV
print 'BT_NAME=NUV'.lstrip('BT_NAME=') # UV
print 'BT_NAME=PUV'.lstrip('BT_NAME=') # PUV - different than before!!!
print 'BT_NAME=SUV'.lstrip('BT_NAME=') # SUV
print 'BT_NAME=mUV'.lstrip('BT_NAME=') # mUV

Could someone please explain what on earth is going on here?

I know I might as well just use array-slicing, but I would still like to understand this.

Thanks

barak manos
  • 29,648
  • 10
  • 62
  • 114

2 Answers2

5

You're misunderstanding how lstrip works. It treats the characters you pass in as a bag and it strips characters that are in the bag until it finds a character that isn't in the bag.

Consider:

'abc'.lstrip('ba')  # 'c'

It is not removing a substring from the start of the string. To do that, you need something like:

if s.startswith(prefix):
    s = s[len(prefix):]

e.g.:

>>> s = 'foobar'
>>> prefix = 'foo'
>>> if s.startswith(prefix):
...     s = s[len(prefix):]
... 
>>> s
'bar'

Or, I suppose you could use a regular expression:

>>> s = 'foobar'
>>> import re
>>> re.sub('^foo', '', s)
'bar'
mgilson
  • 300,191
  • 65
  • 633
  • 696
  • Damn, I should have checked more carefully, wasn't paying too much attention (I have mostly just used it for white spaces). Thanks!!! – barak manos Mar 10 '16 at 17:26
  • 1
    @barakmanos -- I think I probably knew the answer to this one off the top of my head because it's bitten me a time or two as well. I actually really wish there was `stripprefix` and `stripsuffix` methods :-) – mgilson Mar 10 '16 at 17:28
1

The argument given to lstrip is a list of things to remove from the left of a string, on a character by character basis. The phrase is not considered, only the characters themselves.

S.lstrip([chars]) -> string or unicode

Return a copy of the string S with leading whitespace removed. If chars is given and not None, remove characters in chars instead. If chars is unicode, S will be converted to unicode before stripping

You could solve this in a flexible way using regular expressions (the re module):

>>> import re
>>> re.sub('^BT_NAME_PREFIX=', '', 'BT_NAME_PREFIX=MUV')
MUV
André Laszlo
  • 15,169
  • 3
  • 63
  • 81