Python string formatting: % vs concatenation

Question

I'm developing an application in which I perform some requests to get an object id. After each one of them, I call a method (get_actor_info()) passing this id as an argument (see code below).

ACTOR_CACHE_KEY_PREFIX = 'actor_'

def get_actor_info(actor_id):
    cache_key = ACTOR_CACHE_KEY_PREFIX + str(actor_id)

As can be noticed, I'm casting actor_id to string and concatenating it with a prefix. However, I know I could do it in multiple other ways (.format() or '%s%d', for instance) and that results in my question: would '%s%d' be better than string concatenation in terms of readability, code convention and efficiency?

Thanks

You can check by yourself: https://docs.python.org/2/library/timeit.html — bruno desthuilliers, Jan 05 '16 at 19:09
Thanks @brunodesthuilliers! However this answers only the time efficiency part of the question. — Amaury Medeiros, Jan 05 '16 at 19:10
Yes sorry... wrt/ readability and conventions, the answer is plain simple: use `.format()`. — bruno desthuilliers, Jan 05 '16 at 19:22

score 10 · Accepted Answer · answered Jan 06 '16 at 06:01

This could easily become an opinion-based thread, but I find formatting to be more readable in most cases, and more maintainable. It's easier to visualize what the final string will look like, without doing "mental concatenation". Which of these is more readable, for example?

errorString = "Exception occurred ({}) while executing '{}': {}".format(
    e.__class__.__name__, task.name, str(e)
)

Or:

errorString = "Exception occurred (" + e.__class__.__name__
    + ") while executing '" + task.name + "': " + str(e)

As for whether to use % or .format(), I can answer more objectively: Use .format(). % is the "old-style", and, per the Python Documentation they may soon be removed:

Since str.format() is quite new, a lot of Python code still uses the % operator. However, because this old style of formatting will eventually be removed from the language, str.format() should generally be used.

Later versions of the documentation have stopped mentioning this, but nonetheless, .format() is the way of the future; use it!

Concatenation is faster, but that should not be a concern. Make your code readable and maintainable as a first-line goal, and then optimize the parts you need to optimize later. Premature optimization is the root of all evil ;)

score 5 · Answer 2 · edited Jan 06 '16 at 05:34

5

Concatenation is better when it comes to performance. In your example, both concatenation and substitution are readable but when it comes to more complex templates, substitution wins the simplicity and readability race.

For example, if you have data and you want show it in html, concatenation will cause you headache, while substitution will be simple and readable.

edited Jan 06 '16 at 05:34

Amaury Medeiros

2,093
4
26
42

answered Jan 05 '16 at 19:16

Assem

11,574
5
59
97

Concatenation is *worse* when it comes to performance. See https://waymoot.org/home/python_string/ and http://leadsift.com/python-string-concatenation/ – jalanb Sep 28 '16 at 13:17

score 3 · Answer 3 · answered May 31 '16 at 02:54

3

Python 3.6 will introduce yet another option:

ACTOR_CACHE_KEY_PREFIX = 'actor_'

def get_actor_info(actor_id):
    cache_key = f'{ACTOR_CACHE_KEY_PREFIX}{actor_id}'

Performance should be comparable to '{}{}'.format(ACTOR_CACHE_KEY_PREFIX, actor_id), but is arguably more readable.

answered May 31 '16 at 02:54

chepner

497,756
71
530
681

1

Not comparable, it seems `f-strings` are [way faster](https://stackoverflow.com/a/43213810) – Sнаđошƒаӽ Jun 27 '21 at 13:38

score 1 · Answer 4 · answered Dec 28 '18 at 18:38

I guess that, if all the terms to concatenate are constants, the concatenation with the + operator might be optimized by python for performance. Ex.:

DB_PREFIX = 'prod_'
INDEX_PREFIX = 'index_'

CRM_IDX_PREFIX = DB_PREFIX + INDEX_PREFIX + 'crm_'

But most of the cases the format function and operators are used to concatenate with variable content. E.g:

crm_index_name = "{}_{}".format(CRM_IDX_PREFIX, index_id)

In practical terms, if you use the + operator to concatenate like this:

crm_index_name = CRM_IDX_PREFIX + '_' + str(index_id)

you are defining the format by custom code in a fixed way. If you use a format string with named references the code is more readable. E.g:

crm_index_name = "{db_prefix}_{idx_prefix}_{mod_prefix}_{id}".format(
   db_prefix=CRM_IDX_PREFIX,
   idx_prefix=INDEX_PREFIX,
   mod_prefix='crm',
   id=index_id,
)

That way you have the advantage to define the format as a constant. E.g:

IDX_FORMAT = "{db_prefix}_{idx_prefix}_{mod_prefix}_{id}"

crm_index_name = IDX_FORMAT.format(
   db_prefix=CRM_IDX_PREFIX,
   idx_prefix=INDEX_PREFIX,
   mod_prefix='crm',
   id=index_id,
)

And this result more clear in case that you need to change the format in the future. For example, in order to change the order of the separators you only need change the format string to:

IDX_FORMAT = "{db_prefix}_{mod_prefix}_{idx_prefix}-{id}"

As a plus, in order to debug you can assign all those variables to a dictionary and pass it as keyword parameters to the format function:

idx_name_parts = {
   'db_prefix': CRM_IDX_PREFIX,
   'idx_prefix': INDEX_PREFIX,
   'mod_prefix': 'crm',
   'id': index_id,
}
crm_index_name = IDX_FORMAT.format(**idx_name_parts)

Taking advantage of the globals() function we can also:

IDX_FORMAT = "{CRM_IDX_PREFIX}_{mod_prefix}_{INDEX_PREFIX}-{index_id}"

crm_index_name = IDX_FORMAT.format(mod_prefix = 'crm', **globals())

That is similar to the python3's formatted string literal:

crm_index_name = f"{CRM_IDX_PREFIX}_crm_{INDEX_PREFIX}-{index_id}"

I also see Internationalization as another use context where formatted expressions are more useful that + operator. Take the following code:

message = "The account " + str(account_number) + " doesn't exist"

if you use a translation feature like the gettext module with the + operator it would be:

message = _("The account ") + str(account_number) + _(" doesn't exist")

so it is better to translate the whole format string:

message = _("The account {account_number} doesn't exist").format(account_number)

so that the complete message has more sense in the spanish translation file:

#: main.py:523
msgid "The account {account_number} doesn't exist"
msgstr "La cuenta {account_number} no existe."

That is specially helpful in translation to natural languages whose grammatic impose change in the order of the sentence, like german language.

Python string formatting: % vs concatenation

4 Answers4

Linked