-1

I have the following string that will occur repeatedly in a larger string:

[SM_g]word[SM_h].[SM_l] "

Notice in this string after the phrase "[SM_g]word[Sm_h]" there are three components:

  1. A period (.) This could also be a comma (,)
  2. [SM_l]
  3. "

Zero to all three of these components will always appear after "[SM_g]word[SM_h]". However, they can also appear in any order after "[SM_g]word[SM_h]". For example, the string could also be:

[SM_g]word[SM_h][SM_l]"

or

[SM_g]word[SM_h]"[SM_l].

or

[SM_g]word[SM_h]".

or

[SM_g]word[SM_h][SM_1].

or

[SM_g]word[SM_h].

or simply just

[SM_g]word[SM_h]

These are just some of the examples. The point is that there are three different components (more if you consider the period can also be a comma) that can appear after "[SM_h]word[SM_g]" where these three components can be in any order and sometimes one, two, or all three of the components will be missing.

Not only that, sometimes there will be up to one space before " and the previous component/[SM_g]word[SM_h].

For example:

[SM_g]word[SM_h] ".

or

[SM_g]word[SM_h][SM_l] ".

etc. etc.

I am trying to process this string by moving each of the three components inside of the core string (and preserving the space, in case there is a space before &\quot; and the previous component/[SM_g]word[SM_h]).

For example, [SM_g]word[SM_h].[SM_l]" would turn into

[SM_g]word.[SM_l]"[SM_h]

or

[SM_g]word[SM_h]"[SM_l]. would turn into

[SM_g]word"[SM_l].[SM_h]

or, to simulate having a space before "

[SM_g]word[SM_h] ".

would turn into

[SM_g]word ".[SM_h]

and so on.

I've tried several combinations of regex expressions, and none of them have worked.

Does anyone have advice?

Foobar
  • 7,458
  • 16
  • 81
  • 161

2 Answers2

1

You need to put each component within an alternation in a grouping construct with maximum match try of 3 if it is necessary:

\[SM_g]word(\[SM_h])((?:\.|\[SM_l]| ?"){0,3})

You may replace word with .*? if it is not a constant or specific keyword.

Then in replacement string you should do:

$1$3$2

var re = /(\[SM_g]word)(\[SM_h])((?:\.|\[SM_l]| ?"){0,3})/g;
var str = `[SM_g]word[SM_h][SM_l] ".`;

console.log(str.replace(re, `$1$3$2`));
revo
  • 47,783
  • 14
  • 74
  • 117
0

This seems applicable for your process, in other word, changing sub-string position.

(\[SM_g])([^[]*)(\[SM_h])((?=([,\.])|(\[SM_l])|( ?&\\?quot;)).*)?

Demo,,, in which all sub-strings are captured to each capture group respectively for your post processing.

[SM_g] is captured to group1, word to group2, [SM_h] to group3, and string of all trailing part is to group4, [,\.] to group5, [SM_l] to group6, " ?&\\?quot;" to group7.

Thus, group1~3 are core part, group4 is trailing part for checking if trailing part exists, and group5~7 are sub-parts of group4 for your post processing.

Therefore, you can get easily matched string's position changed output string in the order of what you want by replacing with captured groups like follows.

\1\2\7\3 or $1$2$7$3  etc..

For replacing in Javascript, please refer to this post. JS Regex, how to replace the captured groups only?

But above regex is not sufficiently precise because it may allow any repeatitions of the sub-part of the trailing string, for example, \1\2\3\5\5\5\5 or \1\2\3\6\7\7\7\7\5\5\5, etc..

To avoid this situation, it needs to adopt condition which accepts only the possible combinations of the sub-parts of the trailing string. Please refer to this example. https://regex101.com/r/6aM4Pv/1/ for the possible combinations in the order.

But if the regex adopts the condition of allowing only possible combinations, the regex will be more complicated so I leave the above simplified regex to help you understand about it. Thank you:-)

Thm Lee
  • 1,236
  • 1
  • 9
  • 12