1

Is there any way to do integer incremental replacement only with regex.

Here is the problem, I have text file containing 1 000 000 lines all starting with %

I would like to have replace # by integer incrementally using regex.

input:

% line one

% line two

% line three

...

output:

1 line one

2 line two

3 line three

...
Andreas Dolk
  • 113,398
  • 19
  • 180
  • 268
Joey
  • 5,541
  • 7
  • 27
  • 27
  • 8
    Why do you want to do it with a regex only? – Mike Jul 06 '10 at 11:11
  • Why do people always think, a good regexp can solve any problem? To me they're ugly as hell and I hope, I never ever have to maintain one... just look at this one: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/3180176#3180176 - it's black art. – Andreas Dolk Jul 06 '10 at 11:18
  • No - regex wont do "replace all", therefore not even incremental replacements – Imre L Jul 06 '10 at 11:23
  • @Imre: um— you're absolute. “regex won't do "replace all"”? That's wrong. Also, with a helper function, it does incremental replacements. Check my answer, for example. – tzot Aug 05 '10 at 11:43

8 Answers8

5
n = 1
with open('sourcefile.txt') as input:
    with open('destination.txt', 'w') as output:
        for line in input:
            if line.startswith('%'):
                line = str(n) + line[1:]
                n += 1
            output.write(line)
nosklo
  • 217,122
  • 57
  • 293
  • 297
4

Here's a way to do it in Python

import re
from itertools import count
s="""
% line one
% line two
% line three"""

def f():
    n=count(1)
    def inner(m):
        return str(next(n))
    return inner

new_s = re.sub("%",f(),s)

alternatively you could use a lambda function in there like so:

new_s = re.sub("%",lambda m,n=count(1):str(next(n)),s)

But it's easy and better to skip regexp altogether

from __future__ import print_function   # For Python<3
import fileinput

f=fileinput.FileInput("file.txt", inplace=1)
for i,line in enumerate(f):
    print ("{0}{1}".format(i, line[1:]), end="")

Since all the lines start with "%" there is no need to even look at that first char

John La Rooy
  • 295,403
  • 53
  • 369
  • 502
4

Although this problem would best be solved by reading the file line by line and checking the first character with simple string functions, here is how you would do incremental replacement on a string in java:

Pattern p = Pattern.compile("^%");
Matcher m = p.matcher(text);
StringBuffer sb = new StringBuffer();
int i = 0;
while (m.find()) {
    m.appendReplacement(sb, String.valueOf(i++));
}
m.appendTail(sb);

return sb.toString();
Jörn Horstmann
  • 33,639
  • 11
  • 75
  • 118
0

Depending on your choice of language (you've listed a few) PHP's preg_replace_callback() might be an appropriate function to use

$text = "% First Line\n% Second Line\n% Third Line";

function cb_numbers($matches)
{
    static $c = 1;

    return $c++;
}
$text = preg_replace_callback(
            "/(%)/",
            "cb_numbers",
            $text);

echo $text;
Mark Baker
  • 209,507
  • 32
  • 346
  • 385
0

in python re.sub accept function as parameter see http://docs.python.org/library/re.html#re.sub

Xavier Combelle
  • 10,968
  • 5
  • 28
  • 52
0

And a PHP version for good measure:

$input = @fopen('input.txt', 'r');
$output = @fopen("output.txt", "w");

if ($input && $output) {
    $i = 0;
    while (!feof($input)) {
        $line = fgets($input);
        fputs($output, ($line[0] === '%') ?
            substr_replace($line, ++$i, 0, 1) :
            $line
        );
    }
    fclose($input);
    fclose($output);
}

And just because you can, a perl one-liner (yes, with a regex):

perl -i.bak -pe 'BEGIN{$i=1} (s/^%/$i/) && $i++' input.txt
Mike
  • 21,301
  • 2
  • 42
  • 65
0

Here's a C# (3.0+) version:

string s = "% line one\n% line two\n% line three";
int n = 1;
s = Regex.Replace(s, @"(?m)^%", m => { return n++.ToString(); });
Console.WriteLine(s);

output:

1 line one
2 line two
3 line three

Of course it requires the whole text to be loaded into memory. If I were doing this for real, I'd probably go with a line-by-line approach.

Alan Moore
  • 73,866
  • 12
  • 100
  • 156
0
import re, itertools
counter= itertools.count(1)
replacer= lambda match: "%d" % counter.next()
text= re.sub("(?m)^%", replacer, text)

counter is… a counter :). replacer is a function returning the counter values as strings. The "(?m)^%" regex is true for every % at the start of a line (note the multi-line flag).

tzot
  • 92,761
  • 29
  • 141
  • 204