Why were literal formatted strings (f-strings) so slow in Python 3.6 alpha? (now fixed in 3.6 stable)

Question

I've downloaded a Python 3.6 alpha build from the Python Github repository, and one of my favourite new features is literal string formatting. It can be used like so:

>>> x = 2
>>> f"x is {x}"
"x is 2"

This appears to do the same thing as using the format function on a str instance. However, one thing that I've noticed is that this literal string formatting is actually very slow compared to just calling format. Here's what timeit says about each method:

>>> x = 2
>>> timeit.timeit(lambda: f"X is {x}")
0.8658502227130764
>>> timeit.timeit(lambda: "X is {}".format(x))
0.5500578542015617

If I use a string as timeit's argument, my results are still showing the pattern:

>>> timeit.timeit('x = 2; f"X is {x}"')
0.5786435347381484
>>> timeit.timeit('x = 2; "X is {}".format(x)')
0.4145195760771685

As you can see, using format takes almost half the time. I would expect the literal method to be faster because less syntax is involved. What is going on behind the scenes which causes the literal method to be so much slower?

f-strings are dynamic, so the string has to be generated on every loop; whereas the format string is a literal that's created before the code is run, when it's getting converted to bytecode. — PM 2Ring, May 21 '16 at 16:33
@AlexHall Maybe this has to do with the fact that `x` is assigned to a local variable when passed to the `format` method, but has to be found in the `globals` by the `f"..."` syntax. — user2390182, May 21 '16 at 16:36
@AlexHall: this is not a bug. There is simply a different implementation under the hood, as the format string must be parsed at compile time, whereas `str.format()` parses the slots at *runtime*. — Martijn Pieters, May 21 '16 at 16:40
@PM2Ring: all expressions are compiled at compile time and evaluated at runtime. — Martijn Pieters, May 21 '16 at 16:50
@MartijnPieters if the string is compiled at runtime that should mean less calculation. At the very least if `.format` is faster then these strings should simply be compiled into calls to `.format`. — Alex Hall, May 21 '16 at 16:59
Notice that your `.format` is the simplest one possible. It doesn't even use kwargs, or even indexed access. As soon as you use `'{x}'.format(x=x)` the difference is gone. — Antti Haapala -- Слава Україні, May 21 '16 at 17:22
@AnttiHaapala Even using that method, literal formatted strings are still slower, but by less. — Aaron Christiansen, May 21 '16 at 17:26

Martijn Pieters · Accepted Answer · 2017-01-05T08:13:34.923

47

Note: This answer was written for the Python 3.6 alpha releases. A new opcode added to 3.6.0b1 improved f-string performance significantly.

The f"..." syntax is effectively converted to a str.join() operation on the literal string parts around the {...} expressions, and the results of the expressions themselves passed through the object.__format__() method (passing any :.. format specification in). You can see this when disassembling:

>>> import dis
>>> dis.dis(compile('f"X is {x}"', '', 'exec'))
  1           0 LOAD_CONST               0 ('')
              3 LOAD_ATTR                0 (join)
              6 LOAD_CONST               1 ('X is ')
              9 LOAD_NAME                1 (x)
             12 FORMAT_VALUE             0
             15 BUILD_LIST               2
             18 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
             21 POP_TOP
             22 LOAD_CONST               2 (None)
             25 RETURN_VALUE
>>> dis.dis(compile('"X is {}".format(x)', '', 'exec'))
  1           0 LOAD_CONST               0 ('X is {}')
              3 LOAD_ATTR                0 (format)
              6 LOAD_NAME                1 (x)
              9 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
             12 POP_TOP
             13 LOAD_CONST               1 (None)
             16 RETURN_VALUE

Note the BUILD_LIST and LOAD_ATTR .. (join) op-codes in that result. The new FORMAT_VALUE takes the top of the stack plus a format value (parsed out at compile time) to combine these in a object.__format__() call.

So your example, f"X is {x}", is translated to:

''.join(["X is ", x.__format__('')])

Note that this requires Python to create a list object, and call the str.join() method.

The str.format() call is also a method call, and after parsing there is still a call to x.__format__('') involved, but crucially, there is no list creation involved here. It is this difference that makes the str.format() method faster.

Note that Python 3.6 has only been released as an alpha build; this implementation can still easily change. See PEP 494 – Python 3.6 Release Schedule for the time table, as well as Python issue #27078 (opened in response to this question) for a discussion on how to further improve the performance of formatted string literals.

edited Jan 05 '17 at 08:13

answered May 21 '16 at 16:34

Martijn Pieters

1,048,767
296
4,058
3,343

Really nice explanation, thanks! I had no idea that there was a `__format__` magic method either. – Aaron Christiansen May 21 '16 at 16:35
Why is it expanded to `''.join([...])` and not string concatenation? – Alex Hall May 21 '16 at 16:54
3

@AlexHall: because string concatenation has O(N^2) performance characteristics. A + B + C has to create a string for A + B first, then copy the result together with C to a new string. – Martijn Pieters May 21 '16 at 16:55
2

@AlexHall: string joining on the other hand only has to calculate the final string size, copy all of A, B and C into that. That's an O(N) operation. – Martijn Pieters May 21 '16 at 16:57
It doesn't *have* to be like that. Java, for example, expands simple string concatenation into calls to `StringBuilder`. The point is there should be a way to statically compile string concatenation into something that doesn't require creating a list. – Alex Hall May 21 '16 at 16:58
1

@AlexHall: A string builder still needs to allocate memory in chunks to make room for the additional data, then produce the final string object. There are tradeoffs for both approaches. CPython does have an internal optimisation when you use `stringvar += otherstring` or `stringvar = stringvar + otherstring` but that's an implementation detail that would require reworking to support this case too, since there is no actual `stringvar` here. – Martijn Pieters May 21 '16 at 17:07
OK, put it this way. What is `.format` doing that doesn't involve a list, and why not just do that? – Alex Hall May 21 '16 at 17:12
@AlexHall: that may well be an option; create a string constant with the expressions replaced by positional arguments, look up the `.format()` method on that, and pass in the expression results with a `CALL_FUNCTION` opcode. This does require the string to be parsed out each time you do this. I don't know if a deliberate choice was made here based on large-format-string metrics or such. – Martijn Pieters May 21 '16 at 17:14
Maybe worth noting in this answer that the issue has since been resolved. – ideasman42 Jan 05 '17 at 04:29
@ideasman42: well, not in the alpha and beta 1 ;-) The final release has improved the performance significantly, yes. – Martijn Pieters Jan 05 '17 at 08:12

Antti Haapala -- Слава Україні · Answer 2 · 2020-02-25T08:50:48.420

Before 3.6 beta 1, the format string f'x is {x}' was compiled to the equivalent of ''.join(['x is ', x.__format__('')]). The resulting bytecode was inefficient for several reasons:

it built a sequence of string fragments...
... and this sequence was a list, not a tuple! (it is slightly faster to construct tuples than lists).
it pushed an empty string onto the stack
it looked up the join method on the empty string
it invoked __format__ on even bare Unicode objects, for which the __format__('') would always return self, or integer objects, for which __format__('') as the argument returned str(self).
__format__ method isn't slotted.

However, for a more complex and longer string, the literal formatted strings would still have been faster than the corresponding '...'.format(...) call, because for the latter the string is interpreted every time the string is formatted.

This very question was the prime motivator for issue 27078 asking for a new Python bytecode opcode for string fragments into a string (the opcode gets one operand - the number of fragments on the stack; the fragments are pushed onto the stack in the order of appearance i.e. the last part is the topmost item). Serhiy Storchaka implemented this new opcode and merged it into CPython so that it has been available in Python 3.6 ever since beta 1 version (and thus in Python 3.6.0 final).

As the result the literal formatted strings will be much faster than string.format. They are also often much faster than the old-style formatting in Python 3.6, if you're just interpolating str or int objects:

>>> timeit.timeit("x = 2; 'X is {}'.format(x)")
0.32464265200542286
>>> timeit.timeit("x = 2; 'X is %s' % x")
0.2260766440012958
>>> timeit.timeit("x = 2; f'X is {x}'")
0.14437875000294298

f'X is {x}' now compiles to

>>> dis.dis("f'X is {x}'")
  1           0 LOAD_CONST               0 ('X is ')
              2 LOAD_NAME                0 (x)
              4 FORMAT_VALUE             0
              6 BUILD_STRING             2
              8 RETURN_VALUE

The new BUILD_STRING, along with an optimization in FORMAT_VALUE code completely eliminates first 5 of the 6 sources of inefficiency. The __format__ method still isn't slotted, so it requires a dictionary lookup on the class and thus calling it is necessarily slower than calling __str__, but a call can now be completely avoided in the common cases of formatting int or str instances (not subclasses!) without formatting specifiers.

Why were literal formatted strings (f-strings) so slow in Python 3.6 alpha? (now fixed in 3.6 stable)

2 Answers2

Linked