93

Is it possible to increment numbers using regex substitution? Not using evaluated/function-based substitution, of course.

This question was inspired by another one, where the asker wanted to increment numbers in a text editor. There are probably more text editors that support regex substitution than ones that support full-on scripting, so a regex might be convenient to float around, if one exists.

Also, often I've learned neat things from clever solutions to practically useless problems, so I'm curious.

Assume we're only talking about non-negative decimal integers, i.e. \d+.

  • Is it possible in a single substitution? Or, a finite number of substitutions?

  • If not, is it at least possible given an upper bound, e.g. numbers up to 9999?

Of course it's doable given a while-loop (substituting while matched), but we're going for a loopless solution here.

Andrew Cheong
  • 29,362
  • 15
  • 90
  • 145
  • The PCRE library's C API doesn't actually have any concept of "substitution"; rather, it just allows you to obtain detailed information about matches, and you can do anything with that information that you want. And it's not clear to me what sort of string substitution could increment even a *one*-digit number; it would have to have some way of converting `1` to `2` and `2` to `3`, for example, but in Perl the only way to do that is either to use `s/.../.../e`, or else to use interpolation in the replacement-string: `s/\d+/@{[$&+1]}/`. – ruakh Oct 17 '12 at 19:08
  • u have to do it with `evaluator/function based substitution`..Putting everything into regex would make it more complex and yes that would be a stupid thing to.. – Anirudha Oct 17 '12 at 19:14
  • @ruakh - Hm, given the crazy things I've seen people do with regex, I thought converting ``1`` to ``2``, ``2`` to ``3``, etc. would be trivial, but perhaps not! Maybe we should start there. – Andrew Cheong Oct 17 '12 at 19:20
  • 1
    If I understood what you are asking this is a good question and there is a simple problem: think about 99: you have just one possible substitution token (eg replace something with 2 <- token). Where are you going to get the 1 to do the replacing? the one I pose is just a matter of available characters. *I think this would be easier with binary numbers*. – Gabber Oct 17 '12 at 19:22
  • @Gabber - Great comment! Just as you posted it, I began to realize that's where I was getting stuck. So currently, I'm trying to solve the problem _assuming that ``0123456789`` is available at the end of the "document" (grabbing the digit I need with a lookahead)_. Once I solve the problem this way, then I can see if there's some clever way to make "unavailable" numbers appear by magic... – Andrew Cheong Oct 17 '12 at 19:26
  • @Gabber - Hm, I would think, though, that what is possible for binary numbers, is possible (if much, much more complex) for decimal numbers. How would one replace ``0`` with ``1``? – Andrew Cheong Oct 17 '12 at 19:36
  • About binary numbers I'm thinking of a possible workaround which involves multiple substitutions with the same pattern and a placeholder running from left to right. I'm making it work with multiple regexes but I think it can be merged to one. If I succeed I'll let you know – Gabber Oct 17 '12 at 20:12
  • @acheong87 haha, you came up with the same idea of appending all digits at the end, while I wrote my answer ^^... anyway, have a look... this would probably be your solution then ;) – Martin Ender Oct 17 '12 at 20:17
  • Ok, with binary numbers only it's easy, just replace 0?(1)+$ with 1 followed by a number of zeroes corresponding to the number of groups (containing 1) found by the regex.... – Gabber Oct 17 '12 at 20:19
  • How do you figure out the number of groups? – Martin Ender Oct 17 '12 at 20:21
  • Theoretically this question belongs to [Programmers.StackExchange](http://programmers.stackexchange.com/faq), in theory there is [FAQs](http://stackoverflow.com/faq) about that but in practice nobody reads them. – Sampo Sarrala - codidact.org Oct 17 '12 at 20:21
  • @m.buettner in c# (for example) you can iterate through the number of groups, using pure regexes I don't know any possible method .. yet. – Gabber Oct 17 '12 at 20:26
  • @Gabber PCRE will always fill a group with it's last match (and `(1)+` is only one repeated group not multiple ones) – Martin Ender Oct 17 '12 at 20:27
  • Yup, pcre, forgot.... of course not keeping information in a way or another will result in the loss of that information. Being grouping stateless as you state (couldn't resist) you must keep the state of your computation in another way (eg as I said in my comment and as you said in your answer, with a marker). But, of course, no proof => no guarantee – Gabber Oct 17 '12 at 20:44
  • This is a great question. It didn't even occur to me that it might be possible with RegEx when I asked the question you linked to. – BenjaminRH Oct 17 '12 at 22:06

7 Answers7

54

This question's topic amused me for one particular implementation I did earlier. My solution happens to be two substitutions so I'll post it.

My implementation environment is solaris, full example:

echo "0 1 2 3 7 8 9 10 19 99 109 199 909 999 1099 1909" |
perl -pe 's/\b([0-9]+)\b/0$1~01234567890/g' |
perl -pe 's/\b0(?!9*~)|([0-9])(?=9*~[0-9]*?\1([0-9]))|~[0-9]*/$2/g'

1 2 3 4 8 9 10 11 20 100 110 200 910 1000 1100 1910

Pulling it apart for explanation:

s/\b([0-9]+)\b/0$1~01234567890/g

For each number (#) replace it with 0#~01234567890. The first 0 is in case rounding 9 to 10 is needed. The 01234567890 block is for incrementing. The example text for "9 10" is:

09~01234567890 010~01234567890

The individual pieces of the next regex can be described seperately, they are joined via pipes to reduce substitution count:

s/\b0(?!9*~)/$2/g

Select the "0" digit in front of all numbers that do not need rounding and discard it.

s/([0-9])(?=9*~[0-9]*?\1([0-9]))/$2/g

(?=) is positive lookahead, \1 is match group #1. So this means match all digits that are followed by 9s until the '~' mark then go to the lookup table and find the digit following this number. Replace with the next digit in the lookup table. Thus "09~" becomes "19~" then "10~" as the regex engine parses the number.

s/~[0-9]*/$2/g

This regex deletes the ~ lookup table.

DKATyler
  • 914
  • 10
  • 16
  • Very nice. I think you are technically the winner here. I'll mark your answer as accepted. After I try to do it in one pass ;-) – Andrew Cheong Jul 23 '15 at 23:48
  • Impossible to do in one pass for @BradKiers' reason. Congrats! Unique way of carrying over ("rounding") the 9's, and nice compaction by removing the lookup table in one pass. Great that it doesn't even use lookbehinds, and is therefore Javascript compatible. – Andrew Cheong Jul 24 '15 at 03:43
  • 1
    @AndrewCheong Had an idea, I think 1 substitution is possible provided the lookup table can be left as garbage in the solution *and* the regex engine supports backreferences within a lookbehind. Unfortunately, only the .NET regex engine supposedly supports this and I don't have a .NET compiler. – DKATyler Oct 14 '15 at 18:29
  • Well, maybe not, it would also rely on the engine doing an in-place substitution and re-match. Testing on perl, the result stream is different than the source stream. An example: echo "0" | perl -pe 's/((?=0)|(0))(?<=(..))?/_~$3/g' output: _~_~. First match is the 0 length (?=0) with group 3 failing, output stream has _~. Second match is the "0" output stream _~_~. If the lookbehind is reduced to ".", the output is _~_~0 showing the problem. – DKATyler Oct 14 '15 at 19:43
  • @mbomb007 You can't just throw away pieces of the regex and expect it to work right. "\b" definition: The match must occur on a word boundary. Stick the \b in again and it'll work. I'll caution though, regex for incrementing numbers is probably a bad idea. – DKATyler Feb 04 '16 at 23:24
  • A cautionary note, I mistakenly left out the trailing zero in the lookup table. I thought there was only one instance of each digit in the lookup, but leaving off that final zero will cause issues for numbers ending in 9. I ran this and @martinEnder's answer and I like this one best. No looping and no lookbehinds. – Kevin Scharnhorst Dec 05 '19 at 21:04
  • I cannot achieve increment by 2, is that possible? How? – nanmaniac Mar 14 '22 at 13:48
  • 1
    @nanmaniac If you have a language with numbers, you should use that. If you *really* need to increment by 2 in regex you can do it. I'd add a second lookup table: !024680135791 making the original expression s/\b([0-9]+)\b/0$1!024680135791~01234567890/g then handle the carryover step separately: perl -pe 's/\b0(?!9*[8-9]!)|([0-9])(?=9*[8-9]![0-9]*?\1([0-9]))/$2/g' | perl-pe '([0-9])(?=~[0-9]*![0-9]*?\1([0-9]))|![0-9]~[0-9]*/$2/g' <--Completely untested, use at own risk. – DKATyler Mar 15 '22 at 18:57
  • Or, and so much simpler than trying to read the version in the prior comment. Just run the posted answer's regexes twice. – DKATyler Mar 15 '22 at 18:58
47

Wow, turns out it is possible (albeit ugly)!

In case you do not have the time or cannot be bothered to read through the whole explanation, here is the code that does it:

$str = '0 1 2 3 4 5 6 7 8 9 10 11 12 13 19 20 29 99 100 139';
$str = preg_replace("/\d+/", "$0~", $str);
$str = preg_replace("/$/", "#123456789~0", $str);
do
{
$str = preg_replace(
    "/(?|0~(.*#.*(1))|1~(.*#.*(2))|2~(.*#.*(3))|3~(.*#.*(4))|4~(.*#.*(5))|5~(.*#.*(6))|6~(.*#.*(7))|7~(.*#.*(8))|8~(.*#.*(9))|9~(.*#.*(~0))|~(.*#.*(1)))/s",
    "$2$1",
    $str, -1, $count);
} while($count);
$str = preg_replace("/#123456789~0$/", "", $str);
echo $str;

Now let's get started.

So first of all, as the others mentioned, it is not possible in a single replacement, even if you loop it (because how would you insert the corresponding increment to a single digit). But if you prepare the string first, there is a single replacement that can be looped. Here is my demo implementation using PHP.

I used this test string:

$str = '0 1 2 3 4 5 6 7 8 9 10 11 12 13 19 20 29 99 100 139';

First of all, let's mark all digits we want to increment by appending a marker character (I use ~, but you should probably use some crazy Unicode character or ASCII character sequence that definitely will not occur in your target string.

$str = preg_replace("/\d+/", "$0~", $str);

Since we will be replacing one digit per number at a time (from right to left), we will just add that marking character after every full number.

Now here comes the main hack. We add a little 'lookup' to the end of our string (also delimited with a unique character that does not occur in your string; for simplicity I used #).

$str = preg_replace("/$/", "#123456789~0", $str);

We will use this to replace digits by their corresponding successors.

Now comes the loop:

do
{
$str = preg_replace(
    "/(?|0~(.*#.*(1))|1~(.*#.*(2))|2~(.*#.*(3))|3~(.*#.*(4))|4~(.*#.*(5))|5~(.*#.*(6))|6~(.*#.*(7))|7~(.*#.*(8))|8~(.*#.*(9))|9~(.*#.*(~0))|(?<!\d)~(.*#.*(1)))/s",
    "$2$1",
    $str, -1, $count);
} while($count);

Okay, what is going on? The matching pattern has one alternative for every possible digit. This maps digits to successors. Take the first alternative for example:

0~(.*#.*(1))

This will match any 0 followed by our increment marker ~, then it matches everything up to our cheat-delimiter and the corresponding successor (that is why we put every digit there). If you glance at the replacement, this will get replaced by $2$1 (which will then be 1 and then everything we matched after the ~ to put it back in place). Note that we drop the ~ in the process. Incrementing a digit from 0 to 1 is enough. The number was successfully incremented, there is no carry-over.

The next 8 alternatives are exactly the same for the digits 1to 8. Then we take care of two special cases.

9~(.*#.*(~0))

When we replace the 9, we do not drop the increment marker, but place it to the left of our the resulting 0 instead. This (combined with the surrounding loop) is enough to implement carry-over propagation. Now there is one special case left. For all numbers consisting solely of 9s we will end up with the ~ in front of the number. That is what the last alternative is for:

(?<!\d)~(.*#.*(1))

If we encounter a ~ that is not preceded by a digit (therefore the negative lookbehind), it must have been carried all the way through a number, and thus we simply replace it with a 1. I think we do not even need the negative lookbehind (because this is the last alternative that is checked), but it feels safer this way.

A short note on the (?|...) around the whole pattern. This makes sure that we always find the two matches of an alternative in the same references $1 and $2 (instead of ever larger numbers down the string).

Lastly, we add the DOTALL modifier (s), to make this work with strings that contain line breaks (otherwise, only numbers in the last line will be incremented).

That makes for a fairly simple replacement string. We simply first write $2 (in which we captured the successor, and possibly the carry-over marker), and then we put everything else we matched back in place with $1.

That's it! We just need to remove our hack from the end of the string, and we're done:

$str = preg_replace("/#123456789~0$/", "", $str);
echo $str;
> 1 2 3 4 5 6 7 8 9 10 11 12 13 14 20 21 30 100 101 140

So we can do this entirely in regular expressions. And the only loop we have always uses the same regex. I believe this is as close as we can get without using preg_replace_callback().

Of course, this will do horrible things if we have numbers with decimal points in our string. But that could probably be taken care of by the very first preparation-replacement.

Update: I just realised, that this approach immediately extends to arbitrary increments (not just +1). Simply change the first replacement. The number of ~ you append equals the increment you apply to all numbers. So

$str = preg_replace("/\d+/", "$0~~~", $str);

would increment every integer in the string by 3.

Martin Ender
  • 43,427
  • 11
  • 90
  • 130
  • Ahah, put TL;DR at the beginning!! and bravo! this is a really good solution! +1 for the explanation (not tested yet) – Gabber Oct 17 '12 at 20:23
  • 1
    Very nice! Quick answer, too. You can make the pattern prettier by using lookaheads: instead of `(.*#.*(1))`, you can use `(?=.*?(1))` - it still captures, and saves you from replacing with `$2$1` (also - you don't really care if the `1` comes from your lookup or not). I'm pretty sure you can use a similar approach to add two numbers (`"1234 + 5678"`), but that pattern will be ugly (100 lookups? at least 45, I think) - it can be a nice exercise to write a script to generate that pattern. – Kobi Oct 17 '12 at 21:17
  • Good point! However, the `#` in the lookahead was not used to take successors only from behind the `#`. In case someone uses that hack-appendix in a different order (e.g. `~0123456879`), the `~` would be replaced within that bit. That is why I left the `#` in every alternative, so that it works regardless of the order of the appendix. And if you leave that in, the lookahead doesn't make it that much prettier, I think. – Martin Ender Oct 17 '12 at 21:28
  • Got it - good point. You could remove `#[^#]*$`. Then again, my suggestion is 10 characters - same as yours. Minor point, anyway `:)` – Kobi Oct 17 '12 at 21:58
  • Yup, I didn't think about that. I'll leave it explicit though. It's probably complicated enough anyway, to not obscure the answer further with unnecessary optimization. :) – Martin Ender Oct 17 '12 at 22:04
  • +1 - Fast solution to a puzzling problem, great explanation, thanks! – Andrew Cheong Oct 18 '12 at 02:25
  • It's even easier in Perl `$str =~ s/(\d+)/$1+1/e`. – Brad Gilbert Oct 20 '12 at 18:24
  • 4
    @BradGilbert the whole point of the question was not to use callback implementations, but only regex matching and String replacement. Otherwise it's a one-liner in PHP, as well... – Martin Ender Oct 20 '12 at 19:45
12

I managed to get it working in 3 substitutions (no loops).

tl;dr

s/$/ ~0123456789/

s/(?=\d)(?:([0-8])(?=.*\1(\d)\d*$)|(?=.*(1)))(?:(9+)(?=.*(~))|)(?!\d)/$2$3$4$5/g

s/9(?=9*~)(?=.*(0))|~| ~0123456789$/$1/g

Explanation

Let ~ be a special character not expected to appear anywhere in the text.

  1. If a character is nowhere to be found in the text, then there's no way to make it appear magically. So first we insert the characters we care about at the very end.

    s/$/ ~0123456789/
    

    For example,

    0 1 2 3 7 8 9 10 19 99 109 199 909 999 1099 1909
    

    becomes:

    0 1 2 3 7 8 9 10 19 99 109 199 909 999 1099 1909 ~0123456789
    
  2. Next, for each number, we (1) increment the last non-9 (or prepend a 1 if all are 9s), and (2) "mark" each trailing group of 9s.

    s/(?=\d)(?:([0-8])(?=.*\1(\d)\d*$)|(?=.*(1)))(?:(9+)(?=.*(~))|)(?!\d)/$2$3$4$5/g
    

    For example, our example becomes:

    1 2 3 4 8 9 19~ 11 29~ 199~ 119~ 299~ 919~ 1999~ 1199~ 1919~ ~0123456789
    
  3. Finally, we (1) replace each "marked" group of 9s with 0s, (2) remove the ~s, and (3) remove the character set at the end.

    s/9(?=9*~)(?=.*(0))|~| ~0123456789$/$1/g
    

    For example, our example becomes:

    1 2 3 4 8 9 10 11 20 100 110 200 910 1000 1100 1910
    

PHP Example

$str = '0 1 2 3 7 8 9 10 19 99 109 199 909 999 1099 1909';
echo $str . '<br/>';
$str = preg_replace('/$/', ' ~0123456789', $str);
echo $str . '<br/>';
$str = preg_replace('/(?=\d)(?:([0-8])(?=.*\1(\d)\d*$)|(?=.*(1)))(?:(9+)(?=.*(~))|)(?!\d)/', '$2$3$4$5', $str);
echo $str . '<br/>';
$str = preg_replace('/9(?=9*~)(?=.*(0))|~| ~0123456789$/', '$1', $str);
echo $str . '<br/>';

Output:

0 1 2 3 7 8 9 10 19 99 109 199 909 999 1099 1909
0 1 2 3 7 8 9 10 19 99 109 199 909 999 1099 1909 ~0123456789
1 2 3 4 8 9 19~ 11 29~ 199~ 119~ 299~ 919~ 1999~ 1199~ 1919~ ~0123456789
1 2 3 4 8 9 10 11 20 100 110 200 910 1000 1100 1910
WalterV
  • 1,490
  • 2
  • 21
  • 33
Andrew Cheong
  • 29,362
  • 15
  • 90
  • 145
  • @m.buettner - I would like to mark my answer as the accepted one, as I feel it is the most compact. However, you posted the first solution many hours before I did (and neither of our solutions were single substitutions, this having been found impossible), so I'd like to credit you appropriately. Would you (or anyone) be opposed to or have any ethical qualms with me opening and awarding a bounty to you, as soon as I'm able to (in 2 days)? (Do you accept this arrangement, or would you rather I accept your answer? I'm happy with any choice.) – Andrew Cheong Oct 18 '12 at 02:22
  • Cool, that's a nice solution, too! I don't mind the bounty, but I have a few questions about your solution. Firstly, what's the `(?=\d)` for? Secondly, why does the first alternative in step 2 not match the `1` in `10`? Thirdly, I just tried to implement this with PHP (which uses PCRE, too), and I could not get it to work (only strings of `9` were incremented correctly). And lastly, you too need to add the `DOTALL` modifier, otherwise it will only work in the last line of the string. – Martin Ender Oct 18 '12 at 08:08
  • @m.buettner - Hm, I was able to implement it in PHP. Here, I added an example to my answer. Let me know if I'm missing something! Good point about the ``DOTALL`` modifier for multi-line input though. I think usually text editors run the substitution on a per-line basis (or imply ``DOTALL``) anyway, but those are special cases. – Andrew Cheong Oct 18 '12 at 14:09
  • @m.buettner - To answer your questions, the ``(?=\d)`` was one of several ways to prevent the matching of an empty string (you can see how, without it, the expression would always fall back to the second half of both alternations ``|(?=.*(1))`` and ``|)``), and I felt this was the simplest. (Other methods involved lookbehinds, which aren't supported by JavaScript.) And, the first alternative in step 2 doesn't match the ``1`` in ``10`` because I assert ``(?!\d)`` at the end; I am only interested in the _last_ ``[0-8]`` or the _trailing_ string of ``9``s. – Andrew Cheong Oct 18 '12 at 14:13
  • Ah never mind, I accidentally wrapped the regex in `"` instead of `'`. – Martin Ender Oct 18 '12 at 14:52
  • 1
    @m.buettner - Ah, that would do it. Many a time I've twisted my expressions into pretzels before realizing I just didn't escape a backslash. – Andrew Cheong Oct 18 '12 at 15:07
6

Is it possible in a single substitution?

No.

If not, is it at least possible in a single substitution given an upper bound, e.g. numbers up to 9999?

No.

You can't even replace the numbers between 0 and 8 with their respective successor. Once you have matched, and grouped this number:

/([0-8])/

you need to replace it. However, regex doesn't operate on numbers, but on strings. So you can replace the "number" (or better: digit) with twice this digit, but the regex engine does not know it is duplicating a string that holds a numerical value.

Even if you'd do something (silly) as this:

/(0)|(1)|(2)|(3)|(4)|(5)|(6)|(7)|(8)/

so that the regex engine "knows" that if group 1 is matched, the digit '0' is matched, it still cannot do a replacement. You can't instruct the regex engine to replace group 1 with the digit '1', group '2' with the digit '2', etc. Sure, some tools like PHP will let you define a couple of different patterns with corresponding replacement strings, but I get the impression that is not what you were thinking about.

Bart Kiers
  • 166,582
  • 36
  • 299
  • 288
  • Ah, I see. Thanks for the explanation. I agree: as long as in regex (or at least, PCRE), one cannot make unencountered entities appear magically, there is no hope. I'm still thinking of a way ;) but meanwhile, +1. – Andrew Cheong Oct 17 '12 at 19:34
  • good point..regex operates on strings not other types..Using it for diff types is not supported and would be stupid thing 2 do – Anirudha Oct 17 '12 at 19:35
  • @acheong87, I am curious to see what you're thinking of yourself. Although I'm pretty familiar with regex, and pretty sure there is no neat solution in this case, I am sometimes baffled by what people who are more familiar with regex than I am come up with. – Bart Kiers Oct 17 '12 at 19:36
  • I gave up trying to make characters appear magically, but I did come across [a more compact solution](http://stackoverflow.com/questions/12941362/is-it-possible-to-increment-numbers-using-regex-substitution/12946132#12946132). Anyway, many thanks for the initial observation, which so far stands unchallenged! – Andrew Cheong Oct 18 '12 at 02:24
  • It's possible in Perl `$str =~ s/(\d+)/$1+1/e` – Brad Gilbert Oct 20 '12 at 18:26
  • 1
    @BradGilbert, you must have missed the OP's remark *"Not using evaluated/function-based substitution, of course"* in the very first part of his question. – Bart Kiers Oct 20 '12 at 21:23
  • @BradGilbert - Someone managed to do this in 2 substitutions (see newly accepted answer). Thought you'd like to know :-) Still, 1 substitution is definitely impossible, as your answer states. – Andrew Cheong Jul 24 '15 at 22:02
2

It is not possible by regular expression search and substitution alone.

You have to use use something else to help achieve that. You have to use the programming language at hand to increment the number.

Edit:

The regular expressions definition, as part of Single Unix Specification doesn't mention regular expressions supporting evaluation of aritmethic expressions or capabilities for performing aritmethic operations.


Nonetheless, I know some flavors ( TextPad, editor for Windows) allows you to use \i as a substitution term which is an incremental counter of how many times has the search string been found, but it doesn't evaluate or parse found strings into a number nor does it allow to add a number to it.

Tulains Córdova
  • 2,559
  • 2
  • 20
  • 33
  • 3
    This is a valid answer, but it being the negative answer to a positive question, one cannot accept it without proof (whereas a negative/positive answer to a negative/positive question is proven by a simpler means: an example). So, I would very much like to know a proof (or even an outline of one) as to why it is definitively impossible. – Andrew Cheong Oct 17 '12 at 19:30
  • @acheong87 I improved the answer a little bit for you. – Tulains Córdova Oct 17 '12 at 20:20
2

I have found a solution in two steps (Javascript) but it relies on indefinite lookaheads, which some regex engines reject:

const incrementAll = s =>
        s.replaceAll(/(.+)/gm, "$1\n101234567890")
         .replaceAll(/(?:([0-8]|(?<=\d)9)(?=9*[^\d])(?=.*\n\d*\1(\d)\d*$))|(?<!\d)9(?=9*[^\d])(?=(?:.|\n)*(10))|\n101234567890$/gm, "$2$3"); 

The key thing is to add a list of numbers in order at the end of the string in the first step, and in the second, to find the location relevant digit and capture the digit to its right via a lookahead. There are two other branches in the second step, one for dealing with initial nines, and the other for removing the number sequence.

Edit: I just tested it in safari and it throws an error, but it definately works in firefox.

0

I needed to increment indices of output files by one from a pipeline I can't modify. After some searches I got a hit on this page. While the readings are meaningful, they really don't give a readable solution to the problem. Yes it is possible to do it with only regex; no it is not as comprehensible.

Here I would like to give a readable solution using Python, so that others don't need to reinvent the wheels. I can imagine many of you may have ended up with a similar solution.

The idea is to partition file name into three groups, and format your match string so that the incremented index is the middle group. Then it is possible to only increment the middle group, after which we piece the three groups together again.

import re
import sys
import argparse
from os import listdir
from os.path import isfile, join



def main():
    parser = argparse.ArgumentParser(description='index shift of input')
    parser.add_argument('-r', '--regex', type=str,
            help='regex match string for the index to be shift')
    parser.add_argument('-i', '--indir', type=str,
            help='input directory')
    parser.add_argument('-o', '--outdir', type=str,
            help='output directory')

    args = parser.parse_args()
    # parse input regex string
    regex_str = args.regex
    regex = re.compile(regex_str)
    # target directories
    indir = args.indir
    outdir = args.outdir

    try:
        for input_fname in listdir(indir):
            input_fpath = join(indir, input_fname)
            if not isfile(input_fpath): # not a file
                continue

            matched = regex.match(input_fname)
            if matched is None: # not our target file
                continue
            # middle group is the index and we increment it
            index = int(matched.group(2)) + 1
            # reconstruct output
            output_fname = '{prev}{index}{after}'.format(**{
                'prev'  : matched.group(1),
                'index' : str(index),
                'after' : matched.group(3)
            })
            output_fpath = join(outdir, output_fname)

            # write the command required to stdout
            print('mv {i} {o}'.format(i=input_fpath, o=output_fpath))
    except BrokenPipeError:
        pass



if __name__ == '__main__': main()

I have this script named index_shift.py. To give an example of the usage, my files are named k0_run0.csv, for bootstrap runs of machine learning models using parameter k. The parameter k starts from zero, and the desired index map starts at one. First we prepare input and output directories to avoid overriding files

$ ls -1 test_in/ | head -n 5
k0_run0.csv
k0_run10.csv
k0_run11.csv
k0_run12.csv
k0_run13.csv
$ ls -1 test_out/

To see how the script works, just print its output:

$ python3 -u index_shift.py -r '(^k)(\d+?)(_run.+)' -i test_in -o test_out | head -n5
mv test_in/k6_run26.csv test_out/k7_run26.csv
mv test_in/k25_run11.csv test_out/k26_run11.csv
mv test_in/k7_run14.csv test_out/k8_run14.csv
mv test_in/k4_run25.csv test_out/k5_run25.csv
mv test_in/k1_run28.csv test_out/k2_run28.csv

It generates bash mv command to rename the files. Now we pipe the lines directly into bash.

$ python3 -u index_shift.py -r '(^k)(\d+?)(_run.+)' -i test_in -o test_out | bash

Checking the output, we have successfully shifted the index by one.

$ ls test_out/k0_run0.csv
ls: cannot access 'test_out/k0_run0.csv': No such file or directory
$ ls test_out/k1_run0.csv
test_out/k1_run0.csv

You can also use cp instead of mv. My files are kinda big, so I wanted to avoid duplicating them. You can also refactor how many you shift as input argument. I didn't bother, cause shift by one is most of my use cases.

Pik-Mai Hui
  • 311
  • 1
  • 7