regular expression - incremental replacement

Question

Is there any way to do integer incremental replacement only with regex.

Here is the problem, I have text file containing 1 000 000 lines all starting with %

I would like to have replace # by integer incrementally using regex.

input:

% line one

% line two

% line three

...

output:

1 line one

2 line two

3 line three

...

Why do people always think, a good regexp can solve any problem? To me they're ugly as hell and I hope, I never ever have to maintain one... just look at this one: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/3180176#3180176 - it's black art. — Andreas Dolk, Jul 06 '10 at 11:18
No - regex wont do "replace all", therefore not even incremental replacements — Imre L, Jul 06 '10 at 11:23
@Imre: um— you're absolute. “regex won't do "replace all"”? That's wrong. Also, with a helper function, it does incremental replacements. Check my answer, for example. — tzot, Aug 05 '10 at 11:43

score 5 · Accepted Answer · answered Jul 06 '10 at 11:20

5

n = 1
with open('sourcefile.txt') as input:
    with open('destination.txt', 'w') as output:
        for line in input:
            if line.startswith('%'):
                line = str(n) + line[1:]
                n += 1
            output.write(line)

answered Jul 06 '10 at 11:20

nosklo

217,122
57
293
297

John La Rooy · Answer 2 · 2010-07-06T11:40:49.027

4

Here's a way to do it in Python

import re
from itertools import count
s="""
% line one
% line two
% line three"""

def f():
    n=count(1)
    def inner(m):
        return str(next(n))
    return inner

new_s = re.sub("%",f(),s)

alternatively you could use a lambda function in there like so:

new_s = re.sub("%",lambda m,n=count(1):str(next(n)),s)

But it's easy and better to skip regexp altogether

from __future__ import print_function   # For Python<3
import fileinput

f=fileinput.FileInput("file.txt", inplace=1)
for i,line in enumerate(f):
    print ("{0}{1}".format(i, line[1:]), end="")

Since all the lines start with "%" there is no need to even look at that first char

edited Jul 06 '10 at 11:40

answered Jul 06 '10 at 11:19

John La Rooy

295,403
53
369
502

@Andreas_D: Huh, he used regex. – nosklo Jul 06 '10 at 11:23
Ok, I added a (better) alternative using fileinput :) – John La Rooy Jul 06 '10 at 11:42

score 4 · Answer 3 · answered Jul 06 '10 at 11:52

4

Although this problem would best be solved by reading the file line by line and checking the first character with simple string functions, here is how you would do incremental replacement on a string in java:

Pattern p = Pattern.compile("^%");
Matcher m = p.matcher(text);
StringBuffer sb = new StringBuffer();
int i = 0;
while (m.find()) {
    m.appendReplacement(sb, String.valueOf(i++));
}
m.appendTail(sb);

return sb.toString();

answered Jul 06 '10 at 11:52

Jörn Horstmann

33,639
11
75
118

you probably want ++1, not 1++. Line numbers are usually 1-based. – Sean Patrick Floyd Jul 06 '10 at 12:29
...or initialize `i` to one instead of zero. – Alan Moore Jul 06 '10 at 19:44
This was the answer I needed, the one WITH the regular expression. – dlamblin Nov 07 '11 at 19:09

Mark Baker · Answer 4 · 2010-07-06T11:27:30.653

0

Depending on your choice of language (you've listed a few) PHP's preg_replace_callback() might be an appropriate function to use

$text = "% First Line\n% Second Line\n% Third Line";

function cb_numbers($matches)
{
    static $c = 1;

    return $c++;
}
$text = preg_replace_callback(
            "/(%)/",
            "cb_numbers",
            $text);

echo $text;

edited Jul 06 '10 at 11:27

answered Jul 06 '10 at 11:13

Mark Baker

209,507
32
346
385

score 0 · Answer 5 · answered Jul 06 '10 at 11:16

0

in python re.sub accept function as parameter see http://docs.python.org/library/re.html#re.sub

answered Jul 06 '10 at 11:16

Xavier Combelle

10,968
5
28
52

Mike · Answer 6 · 2010-07-06T12:45:21.033

And a PHP version for good measure:

$input = @fopen('input.txt', 'r');
$output = @fopen("output.txt", "w");

if ($input && $output) {
    $i = 0;
    while (!feof($input)) {
        $line = fgets($input);
        fputs($output, ($line[0] === '%') ?
            substr_replace($line, ++$i, 0, 1) :
            $line
        );
    }
    fclose($input);
    fclose($output);
}

And just because you can, a perl one-liner (yes, with a regex):

perl -i.bak -pe 'BEGIN{$i=1} (s/^%/$i/) && $i++' input.txt

score 0 · Answer 7 · answered Jul 06 '10 at 12:43

Here's a C# (3.0+) version:

string s = "% line one\n% line two\n% line three";
int n = 1;
s = Regex.Replace(s, @"(?m)^%", m => { return n++.ToString(); });
Console.WriteLine(s);

output:

1 line one
2 line two
3 line three

Of course it requires the whole text to be loaded into memory. If I were doing this for real, I'd probably go with a line-by-line approach.

score 0 · Answer 8 · answered Aug 05 '10 at 11:41

import re, itertools
counter= itertools.count(1)
replacer= lambda match: "%d" % counter.next()
text= re.sub("(?m)^%", replacer, text)

counter is… a counter :). replacer is a function returning the counter values as strings. The "(?m)^%" regex is true for every % at the start of a line (note the multi-line flag).

regular expression - incremental replacement

8 Answers8