4

Suppose I have the following regular expression in Python and I would like to use a variable instead of [1-12]. For example, my variable is currentMonth = 9

How can I plug currentMonth into the regular expression?

r"(?P<speaker>[A-Za-z\s.]+): (?P<month>[1-12])"

2 Answers2

5

Use string formating to insert currentMonth into the regex pattern:

r"(?P<speaker>[A-Za-z\s.]+): (?P<month>{m:d})".format(m=currentMonth)

By the way, (?P<month>[1-12]) probably does not do what you expect. The regex [1-12] matches 1 or 2 only. If you wanted to match one through twelve, you'd need (?P<month>12|11|10|[1-9]).

unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677
  • 1
    I'd use `{m:d}` to leave no chance for strings that might break the regex in weird ways to be passed. – Rosh Oxymoron Sep 08 '11 at 21:37
  • @Rosh Oxymoron: Hm, nice idea. – unutbu Sep 08 '11 at 21:38
  • @unutbu ``(?P[2-9]|1[012]?)`` better imho – eyquem Sep 08 '11 at 21:45
  • @eyquem: Just curious -- why do you consider it better? It's less readable and also slower for even moderate-sized strings. – unutbu Sep 08 '11 at 21:54
  • @unutbu Because, afaiu and shortly, with ``(?P12|11|10|[1-9])`` the regex motor will have to perform 4 tests to identify a digit 1 to 9 and 1 or 3 tests to identify one of the numbers 12-11-10, while with ``(?P[2-9]|1[012]?)`` it will perform only one test to identify a digit 2 to 9 and 3 tests to identify a number 1 or 10 or 11 or 12. But I know that regex motors are optimized, and I'm maybe wrong – eyquem Sep 08 '11 at 22:05
  • @eyquem: My timeit test was setup incorrectly. I think you are right: `(?P[2-9]|1[012]?)` is slightly faster. I'm going to go with readability and stick with my answer, but if you'd like to post yours I'll upvote it. – unutbu Sep 08 '11 at 22:11
  • @unutbu Well, in fact I realize that ``(?P[1-9]|1[012]])`` is certainly better. In 9/12 of the cases (digits 1 to 9) , only one test will be necessary to identify the digit. But for numbers 10-11-12, three tests will always be necessary. Then you are (maybe) right: in fact, it is maybe the pattern ``(?P[1-9]|10|11|12)`` that is the best. However, the regex motors are optimized, and I remain on the intuition that ``(?P[1-9]|1[012]])`` is probably the real best. But that's only my unauthoritative opinion – eyquem Sep 08 '11 at 22:14
  • @unutbu I didn't time it. I would be surprised if the difference would exceed 3 or 4 %, but that's pure opinion. Personally, I like the idea that I give to the regex motor a minimal work , by crafting meticulously the pattern: writing the pattern is one shot, while the work of the regex may be much repeatitious. (Excuse my poor english) – eyquem Sep 08 '11 at 22:27
  • @unutbu Thank you for your proposition, but I'm not obsessed by reputation points. I would prefer if you could give me your opinion one day on the pattern I posted in the answer of the following thread (http://stackoverflow.com/questions/5917082/regular-expression-to-match-numbers-with-or-without-commas-and-decimals-in-text/5929469#5929469) – eyquem Sep 08 '11 at 22:28
  • @eyquem: You expressed yourself quite well, and I appreciate your viewpoint. However, I prefer readability over speed until I realize something is a bottleneck. Admittedly, `[2-9]|1[012]?` is not too complicated, and is correct afaics, but I am wary of regex because it is so easy to make a mistake. For example, `[1-9]|1[012]]` and `[1-9]|10|11|12` are wrong because they only match `1` even if the string is `10`. – unutbu Sep 08 '11 at 22:35
  • @unutbu Ooooh ! how absent-minded of me ! You're the best. I run to the little triangle pointing towards up , and I go to bed. – eyquem Sep 08 '11 at 23:52
0

I dont know what you're searching through so I can't test this but try:

(r"(?P<speaker>[A-Za-z\s.]+): (?P<month>%r)" % currentMonth, foo)

where foo is the string you're using the expression on.