3

In Python3 I use a UserString to expand the functionality of built-in strings. Usually UserStrings behave just like strs, but with re I run into an unexpected TypeError:

bpython version 0.17.1 on top of Python 3.6.9 /usr/bin/python3

import re
from collections import UserString

s = UserString('foo')
re_repetitions = re.compile(r"(/)\1{1,}", re.DOTALL)
re_repetitions.sub(r"\1", s)

Traceback (most recent call last):
  File "<input>", line 1, in <module>
    re_repetitions.sub(r"\1", s)
TypeError: expected string or bytes-like object

-- why is it so? Usually UserString "quacks" like a string, but re does not treat it as such? Where else a UserString does not behave like a str?

xealits
  • 4,224
  • 4
  • 27
  • 36
  • 1
    `UserString` has `.data` property, so you can write `re_repetitions.sub(r"\1", s.data)` – Andrej Kesely Jan 15 '20 at 16:50
  • I disagree with downvoting this question. I could not find the other one and stackoverflow did not suggest it among similar questions. I doubt that somebody else who needs to expand some behaviour of `str` and uses `UserString` will be able to find that question because it deals with Pandas and nltk. Moreover, my use-case is much more to the point than the other one. – xealits Jan 15 '20 at 19:27
  • Moreover! [The accepted answer](https://stackoverflow.com/a/43727749/1420489) to the other question starts with "As you stated in the comments, some of the values appeared to be floats, not strings..." -- it is a trivial situation that does not correspond clearly to the tricky case of `UserString`. **That is exactly why I asked this question: `UserString` "quacks" like a string -- therefore why does not `re` treat it as such?** The other question and the answer will most likely confuse another person in my situation. – xealits Jan 15 '20 at 19:38
  • @wiktor-stribiżew I edited the question to stress the main point of the issue: I do not run `re` on a `float` as in the linked question, the error message `TypeError: expected string or bytes-like object` does not say explicitly that it requires a `str` object, the documentation on [`UserString`](https://docs.python.org/3/library/collections.html#collections.UserString) states "UserString(seq) Class that simulates a string object", which is also confusing. I think my question presents the issue better and is more useful to `UserString` user than the linked q. Would you consider re-opening? – xealits Jan 16 '20 at 12:37
  • I am not sure if the comment reference worked with a hyphen in it. Just to be sure, let me add another @Wiktor reference. – xealits Jan 16 '20 at 13:40

2 Answers2

4

I would like to add some information actually about UserString class.

Python's built-in types, like str, are actually not real Python classes, but C constructions (assuming the standard Python implementation, CPython). It helps with speed and probably with Python-C interoperability, but the behaviour of built-ins can be different from usual user's classes. There are cases when this implementation detail matters.

This is an example of such situation. Probably, regexp engine behind re is written in C and works with C strings. UserString "simulates a string object" but it is not one.

It seems that all C routines do not work with UserString. For example, you cannot run importlib, subprocess, or os.path on this class. There was an enhancement proposition about it in 2001. But a fix was not implemented, because there is no easy way to do it.

Therefore, there is no standard way to fix it. You need to use some work-around: either .data as in chepner's answer, or as in the linked question do str(UserString('foo')).

In addition, the error message

TypeError: expected string or bytes-like object

actually means that it needs a str object. By default "string" means str in Python. Which can be confusing in case of "string-like" objects etc.


Finally, in my real use case I had to use a string-like class MyString which could be extended with new methods in future. For the time being I just used a MyString = str and thought that it will be easily extensible in the future when the need comes.

The need came, I defined class MyString(UserString) and my tests told me that they need a string. Now, in somewhat awkward and non-Pythonic way, half of my code treats MyString as a string and another half does MyString('foo').data. And to make things worse this MyString class is a part of the interface of one of the modules. So a user must know the implementation details behind the interface of the module...

In my case it seems that I can code around this issue. But that requires some re-writing of the whole module. So probably the feature implemented with UserString is not worth the effort right now.

xealits
  • 4,224
  • 4
  • 27
  • 36
2

An instance of UserString isn't an instance of str, but it does contain a str:

re_repetitions.sub(r"\1", s.data)
xealits
  • 4,224
  • 4
  • 27
  • 36
chepner
  • 497,756
  • 71
  • 530
  • 681
  • 1
    For others who need to use `UserString`, notice that it does not behave as a `str` in a number of other important situations: `import` does not handle it, and you cannot run a `subprocess` with a `UserString` command name. – xealits Jan 15 '20 at 19:29
  • 1
    heh, in 2001 there actually was an enhancement proposition about this situation: [UserString can not be used as string in calls to C routines](https://bugs.python.org/issue232493) -- the resolution would be "a major project" and it was postponed till Python 3000. – xealits Jan 16 '20 at 13:49