Using strip() on string removes prefix that does not match the argument

Question

The string (expressed as UTF-8 bytes),

b'\xe8\xb0\x81\xe6\x98\xaf\xe8\xb0\x81\xe7\x9a\x84\xe5\x91\xa8\xe6\x9d\xb0\xe4\xbc\xa6'

does not begin with

b'\xe6\x98\xaf\xe8\xb0\x81'

However, using strip() on it below does remove this prefix. Does anyone know why this is happening?

Python 3.6.8 (default, Apr 19 2021, 17:20:37) 
[GCC 4.8.5 20150623 (Red Hat 4.8.5-44)] on linux
Type "help", "copyright", "credits" or "license" for more information.

>>> "谁是谁的周杰伦".strip("是谁")
'的周杰伦'
>>> bytes('是谁', encoding='UTF-8')
b'\xe6\x98\xaf\xe8\xb0\x81'
>>> bytes('谁是谁的周杰伦', encoding='UTF-8')
b'\xe8\xb0\x81\xe6\x98\xaf\xe8\xb0\x81\xe7\x9a\x84\xe5\x91\xa8\xe6\x9d\xb0\xe4\xbc\xa6'

Do you mean https://docs.python.org/3/library/stdtypes.html#str.removeprefix ? — Giacomo Catenazzi, Aug 24 '23 at 15:01

Brian61354270 · Accepted Answer · 2023-08-24T15:13:49.217

The complex Unicode codepoints in your question make this a bit more confusing than needed. Consider this simpler example:

>>> "abcde".strip("ba")
'cde'

str.strip is working as intended. The argument to strip is an iterable of characters, not a complete string. Prefixes and suffixes consistently entirely of any of the characters passed in any order get removed.

Quoting the docs:

The outermost leading and trailing chars argument values are stripped from the string. Characters are removed from the leading end until reaching a string character that is not contained in the set of characters in chars. A similar action takes place on the trailing end.

If you want to remove an exact prefix, use str.removeprefix:

>>> "谁是谁的周杰伦".removeprefix("是谁")
'谁是谁的周杰伦'     # no match, bad order

>>> "谁是谁的周杰伦".removeprefix("谁是")
'谁的周杰伦'        # match

Using strip() on string removes prefix that does not match the argument

1 Answers1