52

Sometimes you have to use list comprehension to convert everything to string including strings themselves.

b = [str(a) for a in l]

But do I have to do:

b = [a if type(a)==str else str(a) for a in l]

I was wondering if str on a string is optimized enough to not create another copy of the string.

I have tried:

>>> x="aaaaaa"
>>> str(x) is x
True

but that may be because Python can cache strings, and reuses them. But is that behaviour guaranteed for any value of a string?

Aurora0001
  • 13,139
  • 5
  • 50
  • 53
Jean-François Fabre
  • 137,073
  • 23
  • 153
  • 219
  • 11
    [*For string objects, this is the string itself.*](https://docs.python.org/3/library/stdtypes.html#str) Looks like there should be no overhead. – Wiktor Stribiżew Feb 14 '17 at 10:06
  • 2
    In Python 2 both snippets are not equivalent - `str(u"żółć")` raises `UnicodeEncodeError`. – Łukasz Rogalski Feb 14 '17 at 10:07
  • 2
    Note that checking the type using `type(a)==str` is not a good practice at all. Instead you should use `isinstance()` built-in function instead. – Mazdak Feb 14 '17 at 10:20
  • 2
    @Kasramvd: unless you must test for the exact type, and not allow subclasses. In which case you should use `type(a) is str`, as it is faster. `str` is a singleton, you won't have two different types `str` lying around that this test both needs to produce `True` for. – Martijn Pieters Feb 14 '17 at 10:50
  • @MartijnPieters Yes Exactly. – Mazdak Feb 14 '17 at 10:57
  • @WiktorStribiżew Well, there is some overhead because the function call is still called. There is no memory overhead. – Bakuriu Feb 14 '17 at 18:47

1 Answers1

67

Testing if an object is already a string is slower than just always converting to a string.

That's because the str() method also makes the exact same test (is the object already a string). You are a) doing double the work, and b) your test is slower to boot.

Note: for Python 2, using str() on unicode objects includes an implicit encode to ASCII, and this can fail. You may still have to special case handling of such objects. In Python 3, there is no need to worry about that edge-case.

As there is some discussion around this:

  • isinstance(s, str) has a different meaning when s can be a subclass of str. As subclasses are treated exactly like any other type of object by str() (either __str__ or __repr__ is called on the object), this difference matters here.
  • You should use type(s) is str for exact type checks. Types are singletons, take advantage of this, is is faster:

    >>> import timeit
    >>> timeit.timeit("type(s) is str", "s = ''")
    0.10074466899823165
    >>> timeit.timeit("type(s) == str", "s = ''")
    0.1110201120027341
    
  • Using s if type(s) is str else str(s) is significantly slower for the non-string case:

    >>> import timeit
    >>> timeit.timeit("str(s)", "s = None")
    0.1823573520014179
    >>> timeit.timeit("s if type(s) is str else str(s)", "s = None")
    0.29589492800005246
    >>> timeit.timeit("str(s)", "s = ''")
    0.11716728399915155
    >>> timeit.timeit("s if type(s) is str else str(s)", "s = ''")
    0.12032335300318664
    

    (The timings for the s = '' cases are very close and keep swapping places).

All timings in this post were conducted on Python 3.6.0 on a Macbook Pro 15" (Mid 2015), OS X 10.12.3.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • Can you please give a benchmark on short and large strings. Specially by checking the type using `isinstance()` function. – Mazdak Feb 14 '17 at 10:18
  • Regarding (unlikely) Python 3 edge-cases: http://stackoverflow.com/a/38982099/6260170 – Chris_Rands Feb 14 '17 at 10:26
  • @Chris_Rands: sure, you can break any dunder-method and thus anything that calls the method. `__str__` and `__repr__` are no exceptions there. :-) The implicit encode for `unicode` is part of the default implementation and thus needs calling out; as far as I am aware the Python 3 stdlib has no such edgecases. – Martijn Pieters Feb 14 '17 at 10:31
  • @Kasramvd I think the execution time is not related to the length of the string ;) – 0Tech Feb 14 '17 at 10:43
  • @Kasramvd: at the very least use `type(s) is str`. `isinstance()` is one call, `type(s) == str` or `type(s) is str` is a call and an operator test, so you are asking the interpreter loop to do *more work*. `type(s)` is faster than `isinstance(s, str)`, it is the `== s` or `is s` that adds the extra time here, with `is s` being on-par, roughly. No, the string length has no bearing on either test. – Martijn Pieters Feb 14 '17 at 10:44
  • Oops! sorry it wasn't `type(s) == str ` it was `str(s)`. It returns `92.2 ns per loop` for `str(s)` and `83.9 ns per loop` for `isinstance(s, str)`. – Mazdak Feb 14 '17 at 10:53
  • @Kasramvd: if you are doing that then at least time `str(s)` vs `s if type(s) is str else str(s)`. That's the point of this question after all. You are comparing apples to pears here. Use a mix of strings and non-strings. For only strings, the timings are basically identical (sometimes one wins, sometimes the other, and the timings are always very, very close, around 125ns). For **non-strings** `str()` wins it hands-down, every time. – Martijn Pieters Feb 14 '17 at 11:11
  • Yeah it's only `1ns` slower :-). But It still worth mentioning that `isinstance()` or even `type(s) is str` are slightly faster than `str(s)`. – Mazdak Feb 14 '17 at 11:15
  • @Kasramvd: but they are not. Not for a *mix* of types. And for `type(s) is str` the timings are more or less on par. – Martijn Pieters Feb 14 '17 at 11:16