2

I'm not positive this can be done with just regex statements, but i'm trying to prepend the first captured group onto every subsequent numeric and keep everything else the same. specifically, i have a user inputted string:

987ABC11-15; 77; 877; 66-68

everything after that "ABC" is subject to change - it can be blank or it could be a number followed by any combination of numbers, semicolons, spaces and dashes.

i want to capture that 987ABC and prepend it to the other numbers, so that it becomes:

987ABC11-987ABC15; 987ABC77; 987ABC877; 987ABC66-987ABC68 

Currently I'm trying with match string:

/^([0-9]+[A-Za-z]+)([0-9]+)*([^0-9]+)*([0-9]+)*/g

and substitution:

$1$2$3$1$4

but that's only prepending the first capture group to the first instance of the last capture group instead of all. i.e. it becomes:

987ABC11-987ABC15; 77; 877; 66-68

Any ideas?


update: i've been trying this:

/([0-9]+[A-Za-z]+)(.*)([^0-9A-Za-z]+)([0-9]+)([^0-9A-Za-z])/$1$2$3$1$4$5/

and running it multiple times, i get:

987ABC11-15; 77; 877; 987ABC66-68
987ABC11-15; 77; 987ABC877; 987ABC66-68
987ABC11-15; 987ABC77; 987ABC877; 987ABC66-68
987ABC11-987ABC15; 987ABC77; 987ABC877; 987ABC66-68

which covers everything but the 68 at the end. any idea how to modify this to do that 68 as well?

Bobert1234
  • 130
  • 12
  • 1
    This can be done, but it would need to be coded in. It would require 2 separate regular expressions, because you have 2 tasks: 1) Getting the prefix from the string (e.g. `987ABC`) and 2) Splitting up the remainder of the string into parts which need replacing (looks like you want to insert #1 after every space, and before every hyphen. – Sunny Patel Jul 18 '18 at 21:26
  • that's not a regex solution. perhaps using recursive regex (?R) would work here? I'm not really familiar enough with it to say – Bobert1234 Jul 18 '18 at 21:28
  • 4
    What's the programming language / flavor are you using? This can be achieved by one regex _in some languages_, in others, however, you'd have to split. Check [this answer](https://stackoverflow.com/a/3537914/4934172) for example. – 41686d6564 stands w. Palestine Jul 18 '18 at 21:32
  • is it possible with one regex in php? – Bobert1234 Jul 18 '18 at 21:53
  • I don't think this is possible with one regex, I think @SunnyPatel is right about using some code. You need arbitrary length conditional replacement output, which is not something that exists afaik in any regex implementation, but could be easily achieved with a few lines of code. – Will Barnwell Jul 18 '18 at 23:34
  • As a sidenote: using `(...+)*` is not only [needlessly detrimental](https://www.regular-expressions.info/catastrophic.html), but could also really [mess up your capture groups](https://www.regular-expressions.info/captureall.html) – Will Barnwell Jul 18 '18 at 23:44
  • [Here an idea](https://tio.run/##PY5PC4JAFMTvfooXSLuLmnlIjS3CunT0XJnYtqbkn2VVKKrPbqtBl8cMvxneiEz0/WoT7kNN05tWwhrQ0veC7c5xLGdBwfMo@MNxXcv1Ef3HhOS3WHJRJIzHLCmKS8LuGNlnfLoax8A6RAZ5K2kjE9KuYm1eV1gvCXQNBzzVRfog8NIAIE8BT3gp2qfiRyciBAaqfoyWAi9URfK2k9WPzAYyj6j2MWGYQ9QszrJ6NLTvvw) using a callback. – bobble bubble Sep 26 '22 at 10:58

2 Answers2

2

Update to provide a pattern-only solution that will need to be fired in a loop until there are no replacements made:

Pattern Demo

~(\d+[A-Z]+)(\d+[-; ]+)(\d+\b)~i/$1$2$1$3

or

Pattern Demo

~(\d+[A-Z]+)\d+[-; ]+\K(?=\d+\b)~i/$1

The word boundary metacharacter \b prevents the matching of a prefix as a value.


Original response:

Like bobble bubble, I would use a preg_replace_callback() call and carry the captured prefix value as a modifiable global variable so that it can be used for subsequent substring replacements.

Code: (Comparative PHP Demo) (Pattern Demo)

$string = '987ABC11-15; 77; 877; 66-68';

$pattern = '~(^\d+[A-Z]+)?\d+\D+\K~';
echo preg_replace_callback($pattern, function($m)use(&$prefix) {
    if (isset($m[1])) $prefix = $m[1];
    return $prefix;
}, $string);
// output: 987ABC11-987ABC15; 987ABC77; 987ABC877; 987ABC66-987ABC68

I will be upvoting bobble's answer because it is solid, but I just want to explain why I am posting such a similar procedure...

  • By making the leading substring (prefix) capture group optional and ignoring the first "value" that is already prefixed, my solution performs just 5 replacements (as logically intended) compared to 7.

  • The \K (fullstring match "restarter") ensures that I never "lose" any characters and so I never have to re-insert any in the replacement.

  • Because the prefix capture group is optional, the [1] key in $m only presents itself on the first replacement. For all other replacements, there is no [1] key generated.

  • The first replacement call, delivers the prefix in the capture group AND the zero-width position before the 15. My solution doesn't make a separate trip to deliver the prefix value, nor does it overwrite the prefix before the 11 value.

p.s. I did have a version that was using a null coalesce operator in the callback, but I didn't feel there was any advantage (it was overwriting $prefix each time, instead of calling isset()), so I scrapped it. I also tried desperately to make use of a static variable declaration, but kept running into warnings, so that got scrapped too.

mickmackusa
  • 43,625
  • 12
  • 83
  • 136
  • those are both non-answers as the question clearly specified doing this with just regex – Bobert1234 Jul 19 '18 at 14:02
  • `preg_replace_callback()` is a regex function. It is just as legit as `preg_replace()`. – mickmackusa Jul 19 '18 at 14:03
  • @Bobert1234 You tagged your question `php` so this is how one would try else you'd probably need another regex flavor such as `.NET` for capturing inside a variable length lookbehind ([see demo > context](http://www.regexstorm.net/tester?p=%28%3f%3c%3d%28%5e%5cd%2b%5bA-Z%5d%2b%29.*%3f%29%5cb%28%3f%3d%5cd%2b%29&i=987ABC11-15%3b+77%3b+877%3b+66-68&r=%241)). – bobble bubble Jul 19 '18 at 16:16
  • i'm pretty sure i didn't add that tag. but Ahmed above did ask me what form of regex i was using as that is relevant to its structure and recursive capabilities, and i did ask "is it possible with one regex in php" – Bobert1234 Jul 19 '18 at 16:32
  • 1
    @Bobert1234 How are you executing your pattern, may I ask? You seem disappointed with my working solution, which is a bit of a downer because I did not mean to circumvent your criteria. I cared enough to spend about an hour developing/experimenting/testing and crafting a comprehensive/thoughtful/educational answer -- far more effort than the average SO answer. – mickmackusa Jul 19 '18 at 20:56
  • i'm using a pre-built process that applies regex rules. all that i reasonably modify is what the regexes are within the rules. the process may have been built using php, but my question was from a user level. i.e. "is there a regex statement that can be added to the rules that will accomplish this?" not "how can i overhaul the process to make an exception for this specific scenario?". even if i wanted to do that, it would not be a practical solution, since a lot of the specifics could change and i would have to add on a lot of functionality to allow other users to configure those changes. – Bobert1234 Jul 22 '18 at 20:12
  • thanks for your time, but that's why i wrote, "just regex statements" and "that's not a regex solution" and gave samples that just used regex. i apologize if i was somehow unclear, but your answer did not address the question. – Bobert1234 Jul 22 '18 at 20:13
  • @Bobert1234 thanks for explaining your requirements. I've added a couple of patterns to the top of my answer which strive for accuracy, efficiency, and brevity. – mickmackusa Jul 23 '18 at 01:34
0

caveat: this has to be run multiple times:

/([0-9]+[A-Za-z]+)(.*)([^0-9A-Za-z]+)([0-9]+)([^0-9A-Za-z]|$)/$1$2$3$1$4$5

running this on

987ABC11-15; 77; 877; 66-68

becomes

987ABC11-15; 77; 877; 66-987ABC68
987ABC11-15; 77; 877; 987ABC66-987ABC68
987ABC11-15; 77; 987ABC877; 987ABC66-987ABC68
987ABC11-15; 987ABC77; 987ABC877; 987ABC66-987ABC68
987ABC11-987ABC15; 987ABC77; 987ABC877; 987ABC66-987ABC68

which is what i was looking for


update: a coworker of mine pointed out some ways to make it more efficient:

/([0-9]+[A-Za-z])(.*[^0-9A-Za-z]+)([0-9]+)([^0-9A-Za-z]|$)/$1$2$1$3$4/
Bobert1234
  • 130
  • 12
  • if anyone knows of a way to avoid running it multiple times (e.g. using recursion) that would be even better – Bobert1234 Jul 19 '18 at 07:16