0

I have a pattern which looks like:

abc*_def(##)

and i want to look if this matches for some strings. E.x. it matches for:

abc1_def23
abc10_def99

but does not match for:

abc9_def9

So the * stands for a number which can have one or more digits. The # stands for a number with one digit I want the value in the parenthesis as result

What would be the easiest and simplest solution for this problem? Replace the * and # through regex expression and then look if they match? Like this:

    pattern = pattern.replace('*', '[0-9]*')
    pattern = pattern.replace('#', '[0-9]')
    pattern = '^' + pattern + '$'

Or program it myself?

Sir2B
  • 1,029
  • 1
  • 10
  • 17

3 Answers3

0

Based on your requirements, I would go for a regex for the simple reason it's already available and tested, so it's easiest as you were asking.

The only "complicated" thing in your requirements is avoiding after def the same digit you have after abc. This can be done with a negative backreference. The regex you can use is:

\babc(\d+)_def((?!\1)\d{1,2})\b
  • \b captures word boundaries; if you enclose your regex between two \b you will restrict your search to words, i.e. text delimited by space, punctuations etc
  • abc captures the string abc
  • \d+ captures one or more digits; if there is an upper limit to the number of digits you want, it has to be \d{1,MAX} where MAX is your maximum number of digits; anyway \d stands for a digit and + indicates 1 or more repetitions
  • (\d+) is a group: the use of parenthesis defines \d+ as something you want to "remember" inside your regex; it's somehow similar to defining a variable; in this case, (\d+) is your first group since you defined no other groups before it (i.e. to its left)
  • _def captures the string _def
  • (?!\1) is the part where you say "I don't want to repeat the first group after _def. \1 represents the first group, while (?!whatever) is a check that results positive is what follows the current position is NOT (the negation is given by !) whatever you want to negate.

Live demo here.

Francesco B.
  • 2,729
  • 4
  • 25
  • 37
0

I had the hardest time getting this to work. The trick was the $

#!python2

import re

yourlist = ['abc1_def23', 'abc10_def99', 'abc9_def9', 'abc955_def9', 'abc_def9', 'abc9_def9288', 'abc49_def9234']

for item in yourlist:
    if re.search(r'abc[0-9]+_def[0-9][0-9]$', item):
        print item, 'is a match'
Michael Swartz
  • 858
  • 2
  • 15
  • 27
0

You could match your pattern like:

abc\d+_def(\d{2})

  • abc Match literally
  • \d+ Match 1 or more digits
  • _ Match underscore
  • def - Match literally
  • ( Capturing group (Your 2 digits will be in this group)
    • \d{2} Match 2 digits
  • ) Close capturing group

Then you could for example use search to check for a match and use .group(1) to get the digits between parenthesis.

Demo Python

You could also add word boundaries:

\babc\d+_def(\d{2})\b

The fourth bird
  • 154,723
  • 16
  • 55
  • 70