0

question

How to get each individual replacement result from a Regex replacement?

ex

String regexMatchedWord = matcher.group(); allows me to access the current matched result;

But is there something like String regexMatchedSubstitution = matcher.currentMatchedReplacementResult(); allows me to access the current replacement result?

public class Test {

  public static void main(String[] args) {
    String content_SearchOn = "Sample sentence: snake, snail, snow, spider";
    String regexStrSubstitution = "$2$3$1";
    String regexStrMatchFor = "(s)(.)(.)";

    Matcher matcher = Pattern.compile(regexStrMatchFor).matcher(content_SearchOn);

    ArrayList<String> arr_regexMatchedWord = new ArrayList<>();
    ArrayList<String> arr_regexMatchedSubstitution = new ArrayList<>();

    StringBuilder sb_content_Replaced = new StringBuilder();
    while (matcher.find()) {
      String regexMatchedWord = matcher.group();
      arr_regexMatchedWord.add(regexMatchedWord);

      matcher.appendReplacement(sb_content_Replaced, regexStrSubstitution);

      String regexMatchedSubstitution = null; // << What should I put here -- to get each replacement result?
      arr_regexMatchedSubstitution.add(regexMatchedSubstitution);
    }
    matcher.appendTail(sb_content_Replaced);

    System.out.println(sb_content_Replaced); // Sample enstence: naske, nasil, nosw, pisder
    System.out.println(arr_regexMatchedWord); // [sen, sna, sna, sno, spi]
    System.out.println(arr_regexMatchedSubstitution); // [ens, nas, nas, nos, pis] // << expect

  }

}

comments

  • if Java is not able to do this, is there any other language able to? (Javascript? Python?)

Update: potential solution (workaround)

  • (as talked in the comment) A simple possible way might be:

    convert those $1 into group(1) programmatically,

    but you have to watch out for the escape characters like \ that has special meaning...

  • Another way might be:

    use Reflection to somehow get the local variable result in the source code appendExpandedReplacement(replacement, result); of java.util.regex.Matcher.appendReplacement(StringBuilder, String)

      public Matcher appendReplacement(StringBuilder sb, String replacement) {
          // If no match, return error
          if (first < 0)
              throw new IllegalStateException("No match available");
          StringBuilder result = new StringBuilder();
          appendExpandedReplacement(replacement, result);
          // Append the intervening text
          sb.append(text, lastAppendPosition, first);
          // Append the match substitution
          sb.append(result);
          lastAppendPosition = last;
          modCount++;
          return this;
      }
    
  • Or:

    Record the end index before the append & count from that index to get the Appended Replacement after the append.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
Nor.Z
  • 555
  • 1
  • 5
  • 13
  • 1
    I'm not sure what you mean? It seems like you just want `matcher.group(2) + matcher.group(3) + matcher.group(1)`? – Sweeper Feb 14 '23 at 02:25
  • @Sweeper true... maybe the example is not general enough... let me edit it. – Nor.Z Feb 14 '23 at 02:27
  • @Sweeper ## Actually, you are right - (I couldnt come up with other example -- cuz Java replacemnt syntax is actual simpler than I thought... eg: there is no things like `\U \E`) - (I should have thought of your simple solution...). ## *But 2 more questions*: - so, to get the replacement, all I need to do is just: convert those literal `$1` into `group(1)` & leave other text as they are, right? -- is there no simpler way to do it? - if I am not given the regex replacement string -- I wont know the groups, then can I still get the replacement result? – Nor.Z Feb 14 '23 at 02:47
  • I don't know about you, but I personally think the replacement string syntax is relatively complicated, considering there are escape characters and things. You can see how they parse it in [the source of `appendReplacement`](http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/687fd7c7986d/src/share/classes/java/util/regex/Matcher.java#l794). – Sweeper Feb 14 '23 at 02:51
  • @Sweeper - That seems like I cant just simply convert those `$1` into `group(1)` (source code process them one character by one character)... - Thats also what I afraid & why I was asking for a *standardized function like* `matcher.currentMatchedReplacementResult();` (if such exist) ... – Nor.Z Feb 14 '23 at 02:59
  • Please add some examples of input and expected output to your question. Or is it as simple as `Sample sentence: snake, snail, snow, spider` should produce `Sample enstence: naske, nasil, nosw, pisder`? – Bohemian Feb 14 '23 at 04:37

2 Answers2

1

solution (workaround) Java implementation

  • @logic::

    Record the end index before the append & count from that index to get the Appended Replacement after the append.

  • @code::

      public class Test {
    
        public static void main(String[] args) {
          String content_SearchOn = "Sample sentence: snake, snail, snow, spider";
          String regexStrSubstitution = "$2$3x$1";
          String regexStrMatchFor = "(s)(.)(.).";
    
          Matcher matcher = Pattern.compile(regexStrMatchFor).matcher(content_SearchOn);
    
          ArrayList<String> arr_regexMatchedWord = new ArrayList<>();
          ArrayList<String> arr_regexMatchedSubstitution = new ArrayList<>();
    
          StringBuilder sb_content_SearchOn = new StringBuilder(content_SearchOn);
          StringBuilder sb_content_Replaced = new StringBuilder();
    
          String content_OriPlusCurrAppendSubsti = null;
          StringBuilder sb_CurrAppendSubsti_buffer = null;
    
          int indStart_g0_curr = -1;
          int indEnd_g0_curr = -1;
          int indStart_g0_prev = -1;
          int indEnd_g0_prev = -1;
          while (matcher.find()) {
            // #>>>#
            String regexMatchedWord = matcher.group();
            indStart_g0_curr = matcher.start();
            indEnd_g0_curr = matcher.end();
            arr_regexMatchedWord.add(regexMatchedWord);
    
            // #>>>
            // @main[business logic]::
    
            // <strike> length_sb_content_Replaced_prev = sb_content_Replaced.length();
            // <strike> String regexMatchedSubstitution = sb_content_Replaced.substring(length_sb_content_Replaced_prev);
            // @note: it appends both the `the intervening text` + `the match substitution` ...
    
            //need_check,need_confrim_recall if multi call? // matcher.appendReplacement(new StringBuilder(), regexStrSubstitution); // ok its broken, so cant
    
            //~ matcher.appendReplacement(sb_content_Replaced, regexStrSubstitution);
            sb_CurrAppendSubsti_buffer = new StringBuilder();
            matcher.appendReplacement(sb_CurrAppendSubsti_buffer, regexStrSubstitution + "_$0");
            sb_content_Replaced.append(sb_CurrAppendSubsti_buffer);
            // @main;;
    
            // #>>>
            // @main[get the individual replacement result]::
            //~ String regexMatchedSubstitution = null; // << What should I put here -- to get each replacement result?
            if (indEnd_g0_prev == -1) {
              content_OriPlusCurrAppendSubsti = "";
            } else {
              content_OriPlusCurrAppendSubsti = sb_content_SearchOn.substring(0, indEnd_g0_prev);
            }
            content_OriPlusCurrAppendSubsti += sb_CurrAppendSubsti_buffer;
            String regexMatchedSubstitution = content_OriPlusCurrAppendSubsti.substring(indStart_g0_curr);
            arr_regexMatchedSubstitution.add(regexMatchedSubstitution);
            // @main;;
    
            // #>>>#
            indStart_g0_prev = indStart_g0_curr;
            indEnd_g0_prev = indEnd_g0_curr;
          }
          matcher.appendTail(sb_content_Replaced);
    
          //
          System.out.println(sb_content_Replaced); // Sample enxs_sentence: naxs_snake, naxs_snail, noxs_snow, pixs_spider
          System.out.println(arr_regexMatchedWord); // [sent, snak, snai, snow, spid]
          System.out.println(arr_regexMatchedSubstitution); // [enxs_sent, naxs_snak, naxs_snai, noxs_snow, pixs_spid] // << expect
    
        }
    
      }
    

solution (workaround) Javascript implementation

  • @logic::

    simply brute force with hardcode string delimiter indicator in regex

    1. replaceAll() -- add brackets around the matched replacement during replacement

    2. matchAll() -- search the matched replacement that was enclosed in the brackets

  • @code (moved from specific example to a general class [here])::

      class RegexUtil {
        // https://stackoverflow.com/questions/3561493/is-there-a-regexp-escape-function-in-javascript
        // https://stackoverflow.com/questions/3446170/escape-string-for-use-in-javascript-regex
        /**
         * @param {String} literal_string
         * @returns {String}
         */
        static escapeRegex(literal_string) {
          return literal_string.replace(/[-[\]{}()*+!<=:?.\/\\^$|#\s,]/g, '\\$&');
        }
    
        /**
         * @param {String} string
         * @returns {String}
         */
        static escapeRegexReplacement(string) {
          return string.replace(/\$/g, '$$$$');
        }
    
        /**
         * @param {String} content_SearchOn 
         * @param {RegExp} regexMatchFor 
         * @param {String} regexStrSubstitution 
         * @param {String} regexFlag 
         * @returns {String[]}
         */
        static get_RegexMatchedReplacement(content_SearchOn, regexMatchFor, regexStrSubstitution) {
          const arr_regexMatchedSubstitution = [];
    
          let time_now;
          let delim_regexMatchedSub_left;
          let delim_regexMatchedSub_right;
          /** @type {IterableIterator<RegExpMatchArray>} */ let itr;
          let i = 0;
          do {
            i++;
            if (i === 50) {
              throw new Error('Many loops tried, Unable to brute force with hardcode string indicator in regex. (The chance of this happening is nearly impossible.)');
            }
            time_now = Date.now();
            delim_regexMatchedSub_left = '@drmsL' + time_now + ';';
            delim_regexMatchedSub_right = '@drmsR' + time_now + ';';
            itr = content_SearchOn.matchAll(new RegExp(RegexUtil.escapeRegex(delim_regexMatchedSub_left) + '|' + RegexUtil.escapeRegex(delim_regexMatchedSub_right), 'g'));
          } while (itr.next().done !== true);
    
          const content_Replaced_WithDelimiter = content_SearchOn.replaceAll(regexMatchFor, RegexUtil.escapeRegexReplacement(delim_regexMatchedSub_left) + regexStrSubstitution + RegexUtil.escapeRegexReplacement(delim_regexMatchedSub_right));
          itr = content_Replaced_WithDelimiter.matchAll(new RegExp(RegexUtil.escapeRegex(delim_regexMatchedSub_left) + '(.*?)' + RegexUtil.escapeRegex(delim_regexMatchedSub_right), 'gs')); // need flag s
          for (const matcher_curr of itr) {
            arr_regexMatchedSubstitution.push(matcher_curr[1]);
          }
    
          return arr_regexMatchedSubstitution;
        }
      }
    
  • @code (moved from specific example [here] to a general class)::

        class RegexUtil {
          // https://stackoverflow.com/questions/3561493/is-there-a-regexp-escape-function-in-javascript
          // https://stackoverflow.com/questions/3446170/escape-string-for-use-in-javascript-regex
          /**
           * @param {String} literal_string
           * @returns {String}
           */
          static escapeRegex(literal_string) {
            return literal_string.replace(/[-[\]{}()*+!<=:?.\/\\^$|#\s,]/g, '\\$&');
          }
    
          /**
           * @param {String} string
           * @returns {String}
           */
          static escapeRegexReplacement(string) {
            return string.replace(/\$/g, '$$$$');
          }
        }
    
        //think aga, to use a generic way to escape special meaning delimiter in regex ...
        const content_SearchOn = 'Sample sentence: snake, snail, snow, spider';
        let regexStrSubstitution = '$2$3x$1';
        const regexStrMatchFor = '(s)(.)(.).';
        const regexFlag = 'gmd';
    
        regexStrSubstitution += '_$&';
    
        const arr_regexMatchedWord = [];
        const arr_regexMatchedSubstitution = [];
    
        let time_now;
        let delim_regexMatchedSub_left;
        let delim_regexMatchedSub_right;
        /** @type {IterableIterator<RegExpMatchArray>} */ let itr;
        let i = 0;
        do {
          i++;
          if (i === 50) {
            throw new Error('Many loops tried, Unable to brute force with hardcode string indicator in regex. (The chance of this happening is nearly impossible.)');
          }
          time_now = Date.now();
          delim_regexMatchedSub_left = '@drmsL' + time_now + ';';
          delim_regexMatchedSub_right = '@drmsR' + time_now + ';';
          itr = content_SearchOn.matchAll(new RegExp(RegexUtil.escapeRegex(delim_regexMatchedSub_left) + '|' + RegexUtil.escapeRegex(delim_regexMatchedSub_right), 'g'));
        } while (itr.next().done !== true);
    
        const content_Replaced_WithDelimiter = content_SearchOn.replaceAll(new RegExp(regexStrMatchFor, regexFlag), RegexUtil.escapeRegexReplacement(delim_regexMatchedSub_left) + regexStrSubstitution + RegexUtil.escapeRegexReplacement(delim_regexMatchedSub_right));
        itr = content_Replaced_WithDelimiter.matchAll(new RegExp(RegexUtil.escapeRegex(delim_regexMatchedSub_left) + '(.*?)' + RegexUtil.escapeRegex(delim_regexMatchedSub_right), 'gs')); // need flag s
        for (const matcher_curr of itr) {
          arr_regexMatchedSubstitution.push(matcher_curr[1]);
        }
    
        itr = content_SearchOn.matchAll(new RegExp(regexStrMatchFor, regexFlag));
        for (const matcher_curr of itr) {
          arr_regexMatchedWord.push(matcher_curr[0]);
        }
    
        const content_Replaced = content_SearchOn.replaceAll(new RegExp(regexStrMatchFor, regexFlag), regexStrSubstitution);
    
        console.log(content_Replaced); // Sample enxs_sentence: naxs_snake, naxs_snail, noxs_snow, pixs_spider
        console.log(arr_regexMatchedWord); // [sent, snak, snai, snow, spid]
        console.log(arr_regexMatchedSubstitution); // [enxs_sent, naxs_snak, naxs_snai, noxs_snow, pixs_spid] // << expect
    

comment (minor)

  • The reason to brute force it with hardcode string indicator in regex is that,

    Javascript is even worse at:

  • replacer callback function does not support those $1

    Which makes this following idea useless (could have work) (complex & low performance) ::

        for (const matcher_curr of itr_matcher) {
          ind_ReplaceOnlyCurrOne++;
    
          let ind_Match = -1;
          function replace_OnlyOneWord_c_for_get_regexMatchedSubstitution(...args) {
            ind_Match++;
            /** @type {String} */ const g0 = args[0]; 
            if (ind_Match === ind_ReplaceOnlyCurrOne) {
              // prettier-ignore 
              let arg_last = args.at(-1); let ind_g0; let content_SearchOn; let groups;
              // prettier-ignore 
              if (typeof arg_last === 'string') { content_SearchOn = arg_last; ind_g0 = args.at(-2); } else { groups = arg_last; content_SearchOn = args.at(-2); ind_g0 = args.at(-3); }
    
              arr_regexMatchedWord.push(g0);
              indStart_g0 = ind_g0;
              indEnd_g0 = ind_g0 + g0.length;
    
              return replacer_main(args);
            } else {
              return RegexUtil.escapeRegexReplacement(g0);
            }
          }
    
          const content_ReplacedOnlyCurrOne__P1_Pm_P2 = content_SearchOn.replaceAll(new RegExp(regexStrMatchFor, regexFlag), replace_OnlyOneWord_c_for_get_regexMatchedSubstitution);
          const Pm_P2 = content_ReplacedOnlyCurrOne__P1_Pm_P2.slice(indStart_g0);
          const P2 = content_SearchOn.slice(indEnd_g0);
          const regexMatchedSubstitution__Pm = Pm_P2.replaceAll(new RegExp(RegexUtil.escapeRegexp(P2)+'$', 'g'), '');
          arr_regexMatchedSubstitution.push(regexMatchedSubstitution__Pm);
        }
    
Nor.Z
  • 555
  • 1
  • 5
  • 13
0

You can use replaceAll(Function<MatchResult, String> replacer) on a Matcher to "intercept" the replacement:

String input = "Sample sentence: snake, snail, snow, spider";
List<String> matches = new ArrayList<>();
String result = Pattern.compile("(s)(.)(.)").matcher(input)
  .replaceAll(mr -> {
     matches.add(mr.group());
     return mr.group(2) + mr.group(3) + mr.group(1);
  });
System.out.println(result);
System.out.println(matches);

Output:

Sample enstence: naske, nasil, nosw, pisder
[sen, sna, sna, sno, spi]
Bohemian
  • 412,405
  • 93
  • 575
  • 722
  • - Close but not really what I look for, this was discussed in the comments. - Also this only solves this particular ex, I was looking for a more *general* solution. - My question_post wrote the *potential way* to do it. - And I just *uploaded code* for one of the possible way in my answer_post, (though its long & codes arent structuralized prettily). – Nor.Z Feb 14 '23 at 07:51
  • I don't understand what you want. Is it that you want to get a list of all the original matches? Eg `[sen, sna, sna, sno, spi]`? – Bohemian Feb 14 '23 at 07:58
  • In the ex code of my question_post: - `String regexMatchedSubstitution = null; // << What should I put here -- to get each replacement result?` -- this is what I ultimately want for all general cases. - `System.out.println(arr_regexMatchedSubstitution); // [ens, nas, nas, nos, pis] // << expect` -- this is the expected result only for this particular example. - (The ex I use in my answer_post varied a bit, but the general idea&purpose didnt change.) – Nor.Z Feb 14 '23 at 22:06
  • @Nor.Z see edits to my answer that capture the matched parts of the original string. I don't think I can do much better than this. – Bohemian Feb 15 '23 at 02:04
  • You misunderstood my post. I know can get `the current matched result`, but I want `the current replacement result`, and solution should be more general (not limited only to the ex I provided). (but dont worry, I have a workaround) – Nor.Z Feb 15 '23 at 05:30