0

I'd like to know what's the best way to enumerate placeholders from strings, I've seen there is already another post which asks how-can-i-find-all-placeholders-for-str-format-in-a-python-string-using-a-regex but I'm not sure the answers provided are giving me exactly what I'm looking for, let's examine this little test:

import string

tests = [
    ['this is my placeholder 1 {} and this is the 2 {}', 2],
    ['another placeholder here {} and here \"{}\"', 2]
]
for s in tests:
    num_placeholders = len([
        name for text, name, spec, conv in string.Formatter().parse(s[0])])
    if num_placeholders != s[1]:
        print("FAIL: {0} has {1} placeholders!!! excepted result {2}".format(
            s[0], num_placeholders, s[1]))

It seems string.Formatter is not giving me the expected answer I'm looking for:

FAIL: another placeholder here {} and here "{}" has 3 placeholders!!! excepted result 2
Community
  • 1
  • 1
BPL
  • 9,632
  • 9
  • 59
  • 117

2 Answers2

3

Because you are ignoring the other elements on the tuple that parse(s) returns:

>>> import string
>>> 
>>> tests = [
...     "{} spam eggs {}",
...     "{0} spam eggs {1}",
...     "{0:0.2f} spam eggs {1:0.2f}",
...     "{{1}} spam eggs {{2}}"
... ]
>>> for s in tests:
...     print [x for x in string.Formatter().parse(s)]
... 
[('', '', '', None), (' spam eggs ', '', '', None)]
[('', '0', '', None), (' spam eggs ', '1', '', None)]
[('', '0', '0.2f', None), (' spam eggs ', '1', '0.2f', None)]
[('{', None, None, None), ('1}', None, None, None), (' spam eggs {', None, None, None), ('2}', None, None, None)]

Edit: I see what you mean now. Yes, the interpretation of the parsing is not intuitive nor obvious. The length of the returned list is not for the count of placeholders but for the count of literal portions of strings, including an empty string at the start but not including the empty string at the end. And each element also contains the format of what follows. For example:

>>> list(string.Formatter().parse('{}'))
[('', '', '', None)]

This is the base case, and there is one single empty string of literal text. There are actually two empty strings, but the parser does not include the last empty string.

>>> list(string.Formatter().parse('a {}'))
[('a ', '', '', None)]

Now we have the same as before: only one literal string "a " with nothing that follows. Since there is nothing that follows the format bracket then there is no element.

>>> list(string.Formatter().parse('{} b'))
[('', '', '', None), (' b', None, None, None)]

This is the interesting case: since the format bracket is at the start, the first literal string is an empty literal string, and follows the string " b".

>>> list(string.Formatter().parse('a {1} b {2} c'))
[('a ', '1', '', None), (' b ', '2', '', None), (' c', None, None, None)]

This one is a very complete example. We have three literal string pieces: ['a ', ' b ', ' c']. The confusing part is that the specific format information for the format brackets {} is merged with the previous literal string element.

Edit2:

>>> [x[0] for x in string.Formatter().parse('another placeholder here {} and here \"{}\"')]
['another placeholder here ', ' and here "', '"']

We follow the same logic here. The quotes are just raw literal string, we can change the quotes to something else:

>>> [x[0] for x in string.Formatter().parse('another placeholder here {} and here qqq{}www')]
['another placeholder here ', ' and here qqq', 'www']

If you only consider the 'name' from each returned tuple you only get the literal string. Between each individual element lies the format placeholder.

You need to understand the result of the parse() from the point of view of formatting the string. This result makes it simple to produce the output formatted string. For example:

>>> [x for x in string.Formatter().parse('a{}')]
[('a', '', '', None)]
>>> [x for x in string.Formatter().parse('a')]
[('a', None, None, None)]

With this logic you can count the number of placeholders in a format string like this:

>>> def count_placeholders(fmt):
...     count = 0
...     L = string.Formatter().parse(fmt)
...     for x in L:
...         if x[1] is not None:
...             count += 1
...     return count
... 
>>> count_placeholders('')
0
>>> count_placeholders('{}')
1
>>> count_placeholders('{}{}')
2
>>> count_placeholders('a {}{}')
2
>>> count_placeholders('a {} b {}')
2
>>> count_placeholders('a {} b {} c')
2
vz0
  • 32,345
  • 7
  • 44
  • 77
  • As you can see, I've edited my answer few times and the meaning has changed. I didn't want to create a new question though. Sorry for the inconvenience – BPL Sep 02 '16 at 11:36
  • Nice explanation, tyvm! In any case, I still don't understand this edge case `'another placeholder here {} and here \"{}\"'` – BPL Sep 02 '16 at 11:53
  • What about now? – vz0 Sep 02 '16 at 12:02
  • Perfecto! muchas gracias compañero ;-) – BPL Sep 02 '16 at 12:05
  • @vz0 Why don't you use `len(L)` and then subtract 1 if `L[-1][2] is None`? – akhan Dec 29 '16 at 08:43
  • Actually there can be more then one tuple in `parse()` output with [_format_spec_](https://docs.python.org/2/library/string.html#string.Formatter.parse) (index 2) set to 'None'. This happens with strings like `{}{}{{}}` where braces are escaped. So we can use list comprehension to correctly count placeholders: `len([x for x in string.Formatter().parse(fmt) if x[2] is not None])` – akhan Dec 29 '16 at 08:59
1
import string

def count_placeholders(fmt):
    return sum(1 for x in string.Formatter().parse(fmt) if x[1] is not None)
Aeon
  • 99
  • 1
  • 4
  • 3
    Although this code might solve the problem, a good answer should always explain how this code helps and what it does. – BDL Dec 11 '18 at 09:31