2

Is it possible to write a single Python regular expression that can be applied to a multi-line string and change all occurrences of "foo" to "bar", but only on lines beginning with "#"?

I was able to get this working in Perl, using Perl's \G regular expression sigil, which matches the end of the previous match. However, Python doesn't appear to support this.

Here's the Perl solution, in case it helps:

my $x =<<EOF;
# foo
foo
# foo foo
EOF

$x =~ s{
        (            # begin capture
          (?:\G|^\#) # last match or start of string plus hash
          .*?        # followed by anything, non-greedily
        )            # end capture
        foo
      }
      {$1bar}xmg;

print $x;

The proper output, of course, is:

# bar
foo
# bar bar

Can this be done in Python?


Edit: Yes, I know that it's possible to split the string into individual lines and test each line and then decide whether to apply the transformation, but please take my word that doing so would be non-trivial in this case. I really do need to do it with a single regular expression.

Chad Birch
  • 73,098
  • 23
  • 151
  • 149
mike
  • 46,876
  • 44
  • 102
  • 112
  • I don't see how this is much different than this question which you asked a few hours ago: http://stackoverflow.com/questions/529830/do-python-regexes-support-something-like-perls-g – Sean Bright Feb 09 '09 at 23:46
  • 1
    The earlier responses didn't really answer the question in any way that applied to the underlying problem. I blamed myself for asking the question wrong, and am trying again with a hopefully more clear and applicable version of the question. – mike Feb 09 '09 at 23:50
  • In my experience, questions that involve the phrase "take my word that I need to…" seldom go well. Describe the goal, not the step. http://www.catb.org/~esr/faqs/smart-questions.html – Chuck Feb 10 '09 at 00:13
  • There's a difference: he's asking about the features of the regex library in python versus an implementation of an algorithm. Perhaps this question has been updated since. – Robert P Feb 10 '09 at 01:05

3 Answers3

3
lines = mystring.split('\n')
for line in lines:
    if line.startswith('#'):
        line = line.replace('foo', 'bar')

No need for a regex.

Harley Holcombe
  • 175,848
  • 15
  • 70
  • 63
  • Yes, but as I specifically said in the last line of the question, I'd like to do this without having to split the string and sift through it line by line. – mike Feb 09 '09 at 23:33
  • Why not split the string? I see Mat's provided a regex solution, but I find this one much easier to read. – John Fouhy Feb 09 '09 at 23:44
  • There's an existing function that takes a series of regexes and applies them to an input string, and it's politically infeasible to change this function since quite a lot depends upon it. – mike Feb 09 '09 at 23:48
  • Sorry, missed that last line. I'm genuinely curious why splitting is not an option though, I think both methods load the entire string into memory – Harley Holcombe Feb 10 '09 at 00:11
  • Unfortunately, using regexes for solutions like this in python is not ... well ... pythonic. Text replacement using regexes is not as well supported in python as it is in perl, since python is much more generic in focus. The for loop may be your best bet for a simple, concice implementation. – Robert P Feb 10 '09 at 01:07
1

It looked pretty easy to do with a regular expression:

>>> import re
... text = """line 1
... line 2
... Barney Rubble Cutherbert Dribble and foo
... line 4
... # Flobalob, bing, bong, foo and brian
... line 6"""
>>> regexp = re.compile('^(#.+)foo', re.MULTILINE)
>>> print re.sub(regexp, '\g<1>bar', text)
line 1
line 2
Barney Rubble Cutherbert Dribble and foo
line 4
# Flobalob, bing, bong, bar and brian
line 6

But then trying your example text is not so good:

>>> text = """# foo
... foo
... # foo foo"""
>>> regexp = re.compile('^(#.+)foo', re.MULTILINE)
>>> print re.sub(regexp, '\g<1>bar', text)
# bar
foo
# foo bar

So, try this:

>>> regexp = re.compile('(^#|\g.+)foo', re.MULTILINE)
>>> print re.sub(regexp, '\g<1>bar', text)
# foo
foo
# foo foo

That seemed to work, but I can't find \g in the documentation!

Moral: don't try to code after a couple of beers.

Mat
  • 82,161
  • 34
  • 89
  • 109
  • Wait, Python has a \g sigil that works like Perl's \G? I didn't notice that in the docs. – mike Feb 09 '09 at 23:44
  • Yeah, I just realised that when I saw your example text. Darn! – Mat Feb 09 '09 at 23:46
  • That last one doesn't seem to work at all -- it's all foos and no bars! :) Anyway, I think I'm going to give up on this feature. It's probably not possible. – mike Feb 10 '09 at 00:27
0

\g works in python just like perl, and is in the docs.

"In addition to character escapes and backreferences as described above, \g will use the substring matched by the group named name, as defined by the (?P...) syntax. \g uses the corresponding group number; \g<2> is therefore equivalent to \2, but isn’t ambiguous in a replacement such as \g<2>0. \20 would be interpreted as a reference to group 20, not a reference to group 2 followed by the literal character '0'. The backreference \g<0> substitutes in the entire substring matched by the RE."

Algorias
  • 3,043
  • 5
  • 22
  • 16