70

I've been using the following python code to format an integer part ID as a formatted part number string:

pn = 'PN-{:0>9}'.format(id)

I would like to know if there is a way to use that same format string ('PN-{:0>9}') in reverse to extract the integer ID from the formatted part number. If that can't be done, is there a way to use a single format string (or regex?) to create and parse?

Josh
  • 1,306
  • 2
  • 17
  • 28
  • 8
    Don't use the name `id`. It is also the name of a built-in. – jamylak May 19 '12 at 07:07
  • 11
    While the basic rule of avoiding builtin names is a good one, in practice, the `id` builtin is rarely used, and overriding it within the scope of a method is unlikely to raise any issues. This rule is more applicable when people name variables that override types, like `list`, `dict`, `set`, and `str`. – PaulMcG May 19 '12 at 11:17

4 Answers4

94

The parse module "is the opposite of format()".

Example usage:

>>> import parse
>>> format_string = 'PN-{:0>9}'
>>> id = 123
>>> pn = format_string.format(id)
>>> pn
'PN-000000123'
>>> parsed = parse.parse(format_string, pn)
>>> parsed
<Result ('123',) {}>
>>> parsed[0]
'123'
wpercy
  • 9,636
  • 4
  • 33
  • 45
Brian Dorsey
  • 4,588
  • 24
  • 27
  • Not sure what's going wrong, but what I get is `parsed[0]="000000123"` in python=3.6 and parse=1.12.1. – pitfall Dec 12 '19 at 20:09
6

Here's a solution in case you don't want to use the parse module. It converts format strings into regular expressions with named groups. It makes a few assumptions (described in the docstring) that were okay in my case, but may not be okay in yours.

def match_format_string(format_str, s):
    """Match s against the given format string, return dict of matches.

    We assume all of the arguments in format string are named keyword arguments (i.e. no {} or
    {:0.2f}). We also assume that all chars are allowed in each keyword argument, so separators
    need to be present which aren't present in the keyword arguments (i.e. '{one}{two}' won't work
    reliably as a format string but '{one}-{two}' will if the hyphen isn't used in {one} or {two}).

    We raise if the format string does not match s.

    Example:
    fs = '{test}-{flight}-{go}'
    s = fs.format('first', 'second', 'third')
    match_format_string(fs, s) -> {'test': 'first', 'flight': 'second', 'go': 'third'}
    """

    # First split on any keyword arguments, note that the names of keyword arguments will be in the
    # 1st, 3rd, ... positions in this list
    tokens = re.split(r'\{(.*?)\}', format_str)
    keywords = tokens[1::2]

    # Now replace keyword arguments with named groups matching them. We also escape between keyword
    # arguments so we support meta-characters there. Re-join tokens to form our regexp pattern
    tokens[1::2] = map(u'(?P<{}>.*)'.format, keywords)
    tokens[0::2] = map(re.escape, tokens[0::2])
    pattern = ''.join(tokens)

    # Use our pattern to match the given string, raise if it doesn't match
    matches = re.match(pattern, s)
    if not matches:
        raise Exception("Format string did not match")

    # Return a dict with all of our keywords and their values
    return {x: matches.group(x) for x in keywords}
nonagon
  • 3,271
  • 1
  • 29
  • 42
1

How about:

id = int(pn.split('-')[1])

This splits the part number at the dash, takes the second component and converts it to integer.

P.S. I've kept id as the variable name so that the connection to your question is clear. It is a good idea to rename that variable that it doesn't shadow the built-in function.

NPE
  • 486,780
  • 108
  • 951
  • 1,012
  • +1 The question is kinda vague with no example but this works. – jamylak May 19 '12 at 07:07
  • 6
    The question isn't vague at all, and this answer completely misses the point. The OP is looking for a single string that can be used in formatting to output his data in a particular form, and wants to use that same string as a regex to parse that same text back into an integer. – PaulMcG May 19 '12 at 11:19
  • I found that i can get the ID number from the part number string with `id = int(re.search('(?<=PN-)\w+', pn).group(0))`, but can i use that regular expression to create a string from the ID? (Go from 1234 to PN-000001234 then back to 1234) – Josh May 19 '12 at 13:43
0

Use lucidity

import lucidty

template = lucidity.Template('model', '/jobs/{job}/assets/{asset_name}/model/{lod}/{asset_name}_{lod}_v{version}.{filetype}')

path = '/jobs/monty/assets/circus/model/high/circus_high_v001.abc'
data = template.parse(path)
print(data)

# Output 
#   {'job': 'monty', 
#    'asset_name': 'circus',
#    'lod': 'high', 
#    'version': '001', 
#    'filetype': 'abc'}