Using Python regex matches in eval()

Question

I would like to utilize user input to match and rearrange strings. In Perl a simple example would look like the following:

use strict;

my $str = 'abc123def456ghi';
my $user_input1 = '(\d+).+?(\d+)';
my $user_input2 = '$2.$1';
if ($str =~ /$user_input1/) {
  my $newstr = eval($user_input2);
  print $newstr;
}
else {print "No match..."};

The same works in principle also in Python:

import re

mystr = 'abc123def456ghi'
user_input1 = '(\d+).+?(\d+)'
user_input2 = 'm.group(2) + m.group(1)'
m = re.search(user_input1,mystr)
if m:
    newstr = eval(user_input2)
    print (newstr)
else: print ("No match...")

Result: 456123

However, the expressions 'm.group(1)' and 'm.group(2)' are not very user-friendly if you have to type it several times in an entry field.

Therefore, I am wondering if in Python there are similar compact expression like '$1' '$2' in Perl? I can't get it work in Python with '\1' and '\2'. Any ideas?

Edit:

Sorry, I try to explain after some comments below: I was trying eval() since it seems to work with m.group(1) etc. but apparently for some reason r'\1' etc. is not accepted in eval()

import re
mystr = 'abc123def456ghi'
user_input1 = r'(\d+).+?(\d+)'
user_input2 = r'\2\1'
newstr = eval(user_input2)
print (newstr)

results in

SyntaxError: unexpected character after line continuation character

About the suggestion to use re.sub()

It should not be a simple substitution of a string but a rearrangement+addition of matches. If I modify the original regex

user_input1 = r'(.+?)(\d+).+?(\d+)(.+)'

I can use user_input2 = r'\3\2' However, e.g. if I want to add '999' inbetween the matches (of course, neither r'\3999\2' or r'\3+"999"+\2' does it) then probably I am back using eval() and m.group(1) etc. although I wanted to keep the user input as short and simple as possible. Maybe, I can use some kind of the suggested substitutions.

You can always write e.g. `g = m.group` and then do `g(1)`, `g(2)` for groups. You can then even do `{g(1)}` etc. within an f-string. But there's really no reason to. You'll type it once, but you'll read it hundreds of times, so readability is far more important than saving a few keystrokes. — kindall, Mar 21 '18 at 19:41
You shouldn't be using `eval`. You should be using [this](https://pastebin.com/5VMBDwPP). — ikegami, Mar 21 '18 at 19:41
What does "I can't get it work" mean? Show us the code where you call `re.sub` or `m.expand` or whatever and we can probably tell you what you got wrong, but without any idea of what you tried or what happened, all anyone can say is that you must have done something wrong somewhere. — abarnert, Mar 21 '18 at 19:42
I'm gonna go out on a limb and say this is a duplicate of https://stackoverflow.com/questions/20765265/python-re-sub-back-reference-not-back-referencing — Aran-Fey, Mar 21 '18 at 19:46
What you're trying to do here could be more easily done by just taking format strings. If the user wants to add his first string to his second string, he can just pass, e.g., `{1}{0}`, and you can do `user_input2.format(*m.groups)` or something. It's still just as horribly dangerous (or flexible, if you prefer to think of it that way), but a lot simpler. — abarnert, Mar 21 '18 at 19:47
@kindall, ok, thanks, that's a suggestion to shorten it a bit. However, as I wrote the expressions are user input from some entry fields. So I will **type** it hundreds of times! — theozh, Mar 21 '18 at 19:47
Removed the Perl tag. You're asking how to do something in Python. That fact that you (unfortunately) described the problem using Perl instead of English is irrelevant. — ikegami, Mar 21 '18 at 19:47
@Aran-Fey Maybe. Or maybe the OP thought he could just write a string `r'\1'` and it would magically backref the last-evaluated regex or something. He is coming from perl, and trying to build up a string of perl-esque user input to pass to eval, so I wouldn't be too surprised. — abarnert, Mar 21 '18 at 19:48
If you do want to use $1 etc., it is pretty trivial to write a function that substitutes `\$[0-9]+` with an item from a group... — kindall, Mar 21 '18 at 19:53
@theozh You responded to one comment, so you're obviously here. Why aren't you responding to any of the more important comments that explain why nobody can answer your question without more information? — abarnert, Mar 21 '18 at 20:14
@abarnert, I feel battered about the negative comments and votings. Yes, as you wrote, I thought I overlooked some construct the "magically" use of r'\1' or something similar. It's not about a simple substitution but a rearrangement and/or addition to some matches. I try to explain above. — theozh, Mar 22 '18 at 05:00
@theozh Don't take it personally. People are complaining about your question, not about you. Make the question clear and answerable, by addressing all the comments, and you'll get some upvotes—and, more importantly, an answer. (The downvotes won't all go away—some of the people who voted early will never come back and see the better question. But upvotes count for more than downvotes, and besides, you're learning how to write good questions, and you get points by asking multiple good questions or answers, not one.) — abarnert, Mar 22 '18 at 05:04
@user2357112 and @ikegami, I am using `eval()` because I currently do not see a another solution without `eval()`. If you know a way without please let me know. — theozh, Mar 23 '18 at 04:11
@Aran-Fey, why do you think it is a duplicate question? Here, it is not simply about backreferencing. `r'\1'` etc. seems not to be accepted in `eval()`. If you e.g. know an answer simply using `re.sub()` please let me know. — theozh, Mar 23 '18 at 04:22
@theozh That was before you edited your question. It was just a guess. Turns out I was wrong. — Aran-Fey, Mar 23 '18 at 08:19

Aran-Fey · Accepted Answer · 2018-03-24T20:23:56.830

1

You don't need eval. In fact, you want to avoid eval like the plague.

You can achieve the same output with match.expand:

mystr = 'abc123def456ghi'
user_input1 = r'(\d+).+?(\d+)'
user_input2 = r'\2\1'
match = re.search(user_input1, mystr)
result = match.expand(user_input2)
# result: 456123

The example about inserting 999 between the matches is easily solved by using the \g<group_number> syntax:

mystr = 'abc123def456ghi'
user_input1 = r'(.+?)(\d+).+?(\d+)(.+)'
user_input2 = r'\g<3>999\2'
match = re.search(user_input1, mystr)
result = match.expand(user_input2)
# result: 456999123

As you can see, if all you need to do is move captured text around and insert new text, the regex module has you covered. Only use eval if you really have to execute code.

edited Mar 24 '18 at 20:23

answered Mar 23 '18 at 08:25

Aran-Fey

39,665
11
104
149

ok, I should avoid `eval()` if possible. However, to make the user_input1 as simple as possible I don't want to force him to always type `(.+?)` and `(.+)` at the beginning and end of `user_input1`, respectively. But using `re.sub()` will require this since it is a substitution and not a rearrangement of matches what I would like to have. I still don't see a way achieving this with `re.sub()`. If somebody can show me a way without `eval()`... fine! – theozh Mar 24 '18 at 06:55
have you tested your first example at all? The result is: `abc456123ghi`, and that's not the result what I want. So, with `re.sub()` you need `(.+?)` at the beginng and `(.+)` at the end for `user_input1`.Please explain me why you think you don't need it. With `eval()` I can just newly assemble just the matches `\1`, `\2`,... as I like. Please let me know how I could do this with `re.sub()`. – theozh Mar 24 '18 at 20:01
@theozh Sorry about the misunderstanding. Answer updated. – Aran-Fey Mar 24 '18 at 20:24
in the original question I wrote what I expect: `Result: 456123`. I assume that you know what regular expression are. First, I match the expression `r'(\d+)(.+?)(\d+)'`, in words: several numbers with some characters inbetween and then again several numbers. Second, if there is a match in `mystr` I want to reuse and assemble just the second group of numbers and the first group of numbers, nothing else. If you tested your suggested code this would be a counterexample, because the result is `abc456123ghi`. Sorry, for taking your time. – theozh Mar 24 '18 at 20:32
aaah, `expand()` does the trick! @abarnert mentioned it above. I couldn't find it in my Python tutorial and since you insisted that it can be done with `re.sub()` I thought I don't see something which is obvious to others. OK, now I can finally skip evil `eval()` and everybody is happy. Thank you for the updated answer and your patience. – theozh Mar 24 '18 at 20:56
@theozh No problem. I somehow missed the difference between your output and my output when I first tried my `re.sub` solution and then I didn't bother re-testing it and just insisted that it's correct. Sorry about the whole mess. – Aran-Fey Mar 24 '18 at 21:07

theozh · Answer 2 · 2018-03-22T19:02:45.893

0

"inspired" by the above comments, I think the following seems to be an answer to my question. (Assuming that you have only \1 to \9). At least, this solution was not intuitive and obvious to me (as Python likes to be). More elegant constructs are welcome.

import re

mystr = 'abc123def456ghi'
user_input1 = r'(\d+).+?(\d+)'
user_input2 = r'\2+"999"+\1'
user_input2 = re.sub(r'\\(\d)',r'm.group(\1)',user_input2,flags=re.S)
m = re.search(user_input1,mystr)
if m:
     newstr = eval(user_input2)
     print (newstr)

edited Mar 22 '18 at 19:02

answered Mar 22 '18 at 06:17

theozh

22,244
5
28
72

although I am told not to use `eval()`, so far, this is the first solution which is satisfying my requirements. – theozh Mar 24 '18 at 08:07
@Aran-Fey again, have you tested your code with `re.sub()`? With my Python3.6 the result is `abc456999123ghi`. What is your result? – theozh Mar 24 '18 at 20:07
1

Oh, I see the problem now. The `re.sub` in your code tricked me into thinking that you want to substitute text, when in reality you want to _extract_ text. My bad. I'll update my answer. – Aran-Fey Mar 24 '18 at 20:16

Using Python regex matches in eval()

2 Answers2