How to get the names of the named variables from the python string

Question

Is there a graceful way to get names of named %s-like variables of string object? Like this:

string = '%(a)s and %(b)s are friends.'
names = get_names(string)  # ['a', 'b']

Known alternative ways:

Parse names using regular expression, e.g.:

import re
names = re.findall(r'%\((\w)\)[sdf]', string)  # ['a', 'b']

Use .format()-compatible formating and Formatter().parse(string).

How to get the variable names from the string for the format() method

But what about a string with %s-like variables?

PS: python 2.7

The method you're describing seems to work well. It returns ['a','b']. So what is missing now? — Adi Levin, Jan 19 '16 at 13:03
@AdiLevin The way no.1 requires additional import. The way no.2 requires another string format. I am just curious is there a way to get the same result using only `string` object inner methods and properties or, maybe, some string module functions. — hackprime, Jan 19 '16 at 13:12
What is preventing you from using `format()` for formatting? This seems like one of those cases where it is simply more powerful. — Joost, Jan 19 '16 at 13:15
If you're asking, "Does Python, in the course of performing percent-style formatting, ever produce an intermediary data structure that one could inspect and extract the named parameters from?", it does not. The [formatting code](https://github.com/python-git/python/blob/master/Objects/stringobject.c#L4625) is all C, so there's no native method you could invoke; and it basically operates directly on the final string object, so there's no intermediary object to look at. — Kevin, Jan 19 '16 at 13:34

J. Beattie · Answer 1 · 2018-08-13T15:26:25.273

In order to answer this question, you need to define "graceful". Several factors might be worth considering:

Is the code short, easy to remember, easy to write, and self explanatory?
Does it reuse the underlying logic (i.e. follow the DRY principle)?
Does it implement exactly the same parsing logic?

Unfortunately, the "%" formatting for strings is implemented in the C routine "PyString_Format" in stringobject.c. This routine does not provide an API or hooks that allow access to a parsed form of the format string. It simply builds up the result as it is parsing the format string. Thus any solution will need to duplicate the parsing logic from the C routine. This means DRY is not followed and exposes any solution to breaking if a change is made to the formatting specification.

The parsing algorithm in PyString_Format includes a fair bit of complexity, including handling nested parentheses in key names, so cannot be fully implemented using regular expression nor using string "split()". Short of copying the C code from PyString_Format and converting it to Python code, I do not see any remotely easy way of correctly extracting the names of the mapping keys under all circumstances.

So my conclusion is that there is no "graceful" way to obtain the names of the mapping keys for a Python 2.7 "%" format string.

The following code uses a regular expression to provide a partial solution that covers most common usage:

import re
class StringFormattingParser(object):
    __matcher = re.compile(r'(?<!%)%\(([^)]+)\)[-# +0-9.hlL]*[diouxXeEfFgGcrs]')
    @classmethod
    def getKeyNames(klass, formatString):
        return klass.__matcher.findall(formatString)

# Demonstration of use with some sample format strings
for value in [
    '%(a)s and %(b)s are friends.',
    '%%(nomatch)i',
    '%%',
    'Another %(matched)+4.5f%d%% example',
    '(%(should_match(but does not))s',
    ]:
    print StringFormattingParser.getKeyNames(value)

# Note the following prints out "really does match"!
print '%(should_match(but does not))s' % {'should_match(but does not)': 'really does match'}

P.S. DRY = Don't Repeat Yourself (https://en.wikipedia.org/wiki/Don%27t_repeat_yourself)

score 0 · Answer 2 · answered Jan 19 '16 at 13:15

0

You could also do this:

[y[0] for y in [x.split(')') for x in s.split('%(')] if len(y)>1]

answered Jan 19 '16 at 13:15

Adi Levin

5,165
1
17
26

Just like the regex in the question this fails on `'%%(a)s'`. – BlackJack Jan 19 '16 at 15:41
What's the exact requirement then? Besides %(a)s, what are the other kinds of expressions we need to be able to parse? %%(a)s? Anything else? – Adi Levin Jan 19 '16 at 15:46

score 0 · Answer 3 · answered Jan 23 '16 at 02:53

Don't know if this qualifies as graceful in your book, but here's a short function that parses out the names. No error checking, so it will fail for malformed format strings.

def get_names(s):
    i = s.find('%')
    while 0 <= i < len(s) - 3:
        if s[i+1] == '(':
            yield(s[i+2:s.find(')', i)])
        i = s.find('%', i+2)

string = 'abd %(one) %%(two) 99 %%%(three)'
list(get_names(string) #=> ['one', 'three']

Ilia w495 Nikitin · Answer 4 · 2016-02-18T04:33:39.127

Also, you can reduce this %-task to Formater-solution.

>>> import re
>>> from string import Formatter
>>> 
>>> string = '%(a)s and %(b)s are friends.'
>>> 
>>> string = re.sub('((?<!%)%(\((\w)\)s))', '{\g<3>}',  string)
>>> 
>>> tuple(fn[1] for fn in Formatter().parse(string) if fn[1] is not None)
('a', 'b')
>>>

In this case you can use both variants of formating, I suppose.

The regular expression in it depends on what you want.

>>> re.sub('((?<!%)%(\((\w)\)s))', '{\g<3>}', '%(a)s and %(b)s are %(c)s friends.')
'{a} and {b} are {c} friends.'
>>> re.sub('((?<!%)%(\((\w)\)s))', '{\g<3>}', '%(a)s and %(b)s are %%(c)s friends.')
'{a} and {b} are %%(c)s friends.'
>>> re.sub('((?<!%)%(\((\w)\)s))', '{\g<3>}', '%(a)s and %(b)s are %%%(c)s friends.')
'{a} and {b} are %%%(c)s friends.'

How to get the names of the named variables from the python string

4 Answers4