38

I'm in the process of moving an application from PHP to Java and there is heavy use of regular expressions in the code. I've run across something in PHP that doesn't seem to have a java equivalent:

preg_replace_callback()

For every match in the regex, it calls a function that is passed the match text as a parameter. As an example usage:

$articleText = preg_replace_callback("/\[thumb(\d+)\]/",'thumbReplace', $articleText);
# ...
function thumbReplace($matches) {
   global $photos;
   return "<img src=\"thumbs/" . $photos[$matches[1]] . "\">";
}

What would be the ideal way to do this in Java?

Mike
  • 8,853
  • 3
  • 35
  • 44

8 Answers8

59

Trying to emulate PHP's callback feature seems an awful lot of work when you could just use appendReplacement() and appendTail() in a loop:

StringBuffer resultString = new StringBuffer();
Pattern regex = Pattern.compile("regex");
Matcher regexMatcher = regex.matcher(subjectString);
while (regexMatcher.find()) {
  // You can vary the replacement text for each match on-the-fly
  regexMatcher.appendReplacement(resultString, "replacement");
}
regexMatcher.appendTail(resultString);
Jan Goyvaerts
  • 21,379
  • 7
  • 60
  • 72
  • 3
    I think that some JDK classes do have powerful features but those features are sometimes hidden behind strange class names or strange method names... Although the `appendReplacement/appendTail` strategy,as used here, requires less code, the `callback` strategy (OP's chosen answer) is clearer, more obvious ! – Stephan Jul 08 '13 at 13:06
  • What if I need to matched string to get the right replacement? Say subjectString might contains "foo bar" but I need to replace "foo" by "Jan" and "bar" by "Goyvaerts"? – ALOToverflow Jul 17 '13 at 17:28
  • Use `foo|bar` as your regex and query `regexMatcher.group()` inside the loop to see which replacement you need to append. – Jan Goyvaerts Jul 18 '13 at 01:09
  • 5
    This is the correct answer. The accepted answer will fail with certain input, because it calls `.reset()` – Eric Aug 24 '13 at 21:51
  • 1
    This is not a great match to php's functionality - The replacement string in this must be careful not to include special characters and back references.use `Matcher.quoteReplacement` – goat Nov 30 '17 at 23:26
24

IMPORTANT: As pointed out by Kip in the comments, this class has an infinite loop bug if the matching regex matches on the replacement string. I'll leave it as an exercise to readers to fix it, if necessary.


I don't know of anything similar that's built into Java. You could roll your own without too much difficulty, using the Matcher class:

import java.util.regex.*;

public class CallbackMatcher
{
    public static interface Callback
    {
        public String foundMatch(MatchResult matchResult);
    }

    private final Pattern pattern;

    public CallbackMatcher(String regex)
    {
        this.pattern = Pattern.compile(regex);
    }

    public String replaceMatches(String string, Callback callback)
    {
        final Matcher matcher = this.pattern.matcher(string);
        while(matcher.find())
        {
            final MatchResult matchResult = matcher.toMatchResult();
            final String replacement = callback.foundMatch(matchResult);
            string = string.substring(0, matchResult.start()) +
                     replacement + string.substring(matchResult.end());
            matcher.reset(string);
        }
    }
}

Then call:

final CallbackMatcher.Callback callback = new CallbackMatcher.Callback() {
    public String foundMatch(MatchResult matchResult)
    {
        return "<img src=\"thumbs/" + matchResults.group(1) + "\"/>";
    }
};

final CallbackMatcher callbackMatcher = new CallbackMatcher("/\[thumb(\d+)\]/");
callbackMatcher.replaceMatches(articleText, callback);

Note that you can get the entire matched string by calling matchResults.group() or matchResults.group(0), so it's not necessary to pass the callback the current string state.

EDIT: Made it look more like the exact functionality of the PHP function.

Here's the original, since the asker liked it:

public class CallbackMatcher
{
    public static interface Callback
    {
        public void foundMatch(MatchResult matchResult);
    }

    private final Pattern pattern;

    public CallbackMatcher(String regex)
    {
        this.pattern = Pattern.compile(regex);
    }

    public String findMatches(String string, Callback callback)
    {
        final Matcher matcher = this.pattern.matcher(string);
        while(matcher.find())
        {
            callback.foundMatch(matcher.toMatchResult());
        }
    }
}

For this particular use case, it might be best to simply queue each match in the callback, then afterwards run through them backwards. This will prevent having to remap indexes as the string is modified.

Community
  • 1
  • 1
jdmichal
  • 10,984
  • 4
  • 43
  • 42
  • I actually like your original answer better with queuing the returned string and indexes. Then applying them in reverse. This way is simpler, but seems to do more work, having to rescan the entire string for each match. Thanks for the suggestion! – Mike Dec 17 '08 at 18:35
  • I added the original suggestion back in. The expected input size would make the difference as to whether rescanning or queueing then replacing would be more effective. I suppose one could also have the replace method queue them, along with the replacement string... – jdmichal Dec 17 '08 at 18:44
  • Errr... Misspoke. Obviously queueing is always more effective in regards to CPU time. The difference would be whether it's a big enough problem to worry about. – jdmichal Dec 17 '08 at 18:46
  • 2
    This has a bug in that you're calling matcher.reset() at the end of each loop iteration. If the replacement string matches the pattern, you'll get into an infinite loop. using appendReplacment() and appendTail() with a StringBuffer would be safer. – Kip Apr 01 '10 at 19:57
  • Good catch Kip. I think the only way to correctly implement this using these interfaces is to queue the matches and replace them after all the match operations are complete. I am confused though as to why you think using StringBuffer would help this. Unless you simply meant that it would help performance, as opposed to using the + operator. The real crux is that you cannot replace matches with a lower index without corrupting matches of a higher index. Hence needing to queue them and work through them backwards, or reset the matcher after each replacement. – jdmichal May 06 '10 at 20:39
3

I wasn't quite satisfied with any of the solutions here. I wanted a stateless solution. And I didn't want to end up in an infinite loop if my replacement string happened to match the pattern. While I was at it I added support for a limit parameter and a returned count parameter. (I used an AtomicInteger to simulate passing an integer by reference.) I moved the callback parameter to the end of the parameter list, to make it easier to define an anonymous class.

Here is an example of usage:

final Map<String,String> props = new HashMap<String,String>();
props.put("MY_NAME", "Kip");
props.put("DEPT", "R&D");
props.put("BOSS", "Dave");

String subjectString = "Hi my name is ${MY_NAME} and I work in ${DEPT} for ${BOSS}";
String sRegex = "\\$\\{([A-Za-z0-9_]+)\\}";

String replacement = ReplaceCallback.replace(sRegex, subjectString, new ReplaceCallback.Callback() {
  public String matchFound(MatchResult match) {
    String group1 = match.group(1);
    if(group1 != null && props.containsKey(group1))
      return props.get(group1);
    return match.group();
  }
});

System.out.println("replacement: " + replacement);

And here is my version of ReplaceCallback class:

import java.util.concurrent.atomic.AtomicInteger;
import java.util.regex.*;

public class ReplaceCallback
{
  public static interface Callback {
    /**
     * This function is called when a match is made. The string which was matched
     * can be obtained via match.group(), and the individual groupings via
     * match.group(n).
     */
    public String matchFound(MatchResult match);
  }

  /**
   * Replaces with callback, with no limit to the number of replacements.
   * Probably what you want most of the time.
   */
  public static String replace(String pattern, String subject, Callback callback)
  {
    return replace(pattern, subject, -1, null, callback);
  }

  public static String replace(String pattern, String subject, int limit, Callback callback)
  {
    return replace(pattern, subject, limit, null, callback);
  }

  /**
   * @param regex    The regular expression pattern to search on.
   * @param subject  The string to be replaced.
   * @param limit    The maximum number of replacements to make. A negative value
   *                 indicates replace all.
   * @param count    If this is not null, it will be set to the number of
   *                 replacements made.
   * @param callback Callback function
   */
  public static String replace(String regex, String subject, int limit,
          AtomicInteger count, Callback callback)
  {
    StringBuffer sb = new StringBuffer();
    Matcher matcher = Pattern.compile(regex).matcher(subject);
    int i;
    for(i = 0; (limit < 0 || i < limit) && matcher.find(); i++)
    {
      String replacement = callback.matchFound(matcher.toMatchResult());
      replacement = Matcher.quoteReplacement(replacement); //probably what you want...
      matcher.appendReplacement(sb, replacement);
    }
    matcher.appendTail(sb);

    if(count != null)
      count.set(i);
    return sb.toString();
  }
}
Kip
  • 107,154
  • 87
  • 232
  • 265
3
public static String replace(Pattern pattern, Function<MatchResult, String> callback, CharSequence subject) {
    Matcher m = pattern.matcher(subject);
    StringBuffer sb = new StringBuffer();
    while (m.find()) {
        m.appendReplacement(sb, callback.apply(m.toMatchResult()));
    }
    m.appendTail(sb);
    return sb.toString();
}

Usage example:

replace(Pattern.compile("cat"), mr -> "dog", "one cat two cats in the yard")

will produce the return value:

one dog two dogs in the yard

Adrian Leonhard
  • 7,040
  • 2
  • 24
  • 38
holmis83
  • 15,922
  • 5
  • 82
  • 83
  • StringBuilder would be slightly more performant: https://www.journaldev.com/137/stringbuffer-vs-stringbuilder – Charlie Jun 30 '19 at 22:59
  • I editing it to change it to StringBuilder, then I realized that that doesn't work, because appendReplacement expects a *StringBuffer*. I reverted it, sorry about that. – Adrian Leonhard May 02 '21 at 16:26
1

Java 9 introduced the Matcher#replaceAll method accepting a Function<MatchResult,String> to return the replacement given a specific match, which does it quite elegantly.

Patern.compile("regex").matcher("some string")
     .replaceAll(matchResult -> "something" + matchResult.group());
Unmitigated
  • 76,500
  • 11
  • 62
  • 80
0

I found that jdmichal's answer would infinite loop if your returned string could be matched again; below is a modification which prevents infinite loops from this matching.

public String replaceMatches(String string, Callback callback) {
    String result = "";
    final Matcher matcher = this.pattern.matcher(string);
    int lastMatch = 0;
    while(matcher.find())
    {
        final MatchResult matchResult = matcher.toMatchResult();
        final String replacement = callback.foundMatch(matchResult);
        result += string.substring(lastMatch, matchResult.start()) +
            replacement;
        lastMatch = matchResult.end();
    }
    if (lastMatch < string.length())
        result += string.substring(lastMatch);
    return result;
}
jevon
  • 3,197
  • 3
  • 32
  • 40
0

Matcher#replaceAll is what you're looking for.

Pattern.compile("random number")
    .matcher("this is a random number")
    .replaceAll(r -> "" + ThreadLocalRandom.current().nextInt()) 

Output:

this is a -107541873
Rubydesic
  • 3,386
  • 12
  • 27
-1

Here is the final result of what I did with your suggestion. I thought it would be nice to have out here in case someone has the same problem. The resulting calling code looks like:

content = ReplaceCallback.find(content, regex, new ReplaceCallback.Callback() {
    public String matches(MatchResult match) {
        // Do something special not normally allowed in regex's...
        return "newstring"
    }
});

The entire class listing follows:

import java.util.regex.MatchResult;
import java.util.regex.Pattern;
import java.util.regex.Matcher;
import java.util.Stack;

/**
 * <p>
 * Class that provides a method for doing regular expression string replacement by passing the matched string to
 * a function that operates on the string.  The result of the operation is then used to replace the original match.
 * </p>
 * <p>Example:</p>
 * <pre>
 * ReplaceCallback.find("string to search on", "/regular(expression/", new ReplaceCallback.Callback() {
 *      public String matches(MatchResult match) {
 *          // query db or whatever...
 *          return match.group().replaceAll("2nd level replacement", "blah blah");
 *      }
 * });
 * </pre>
 * <p>
 * This, in effect, allows for a second level of string regex processing.
 * </p>
 *
 */
public class ReplaceCallback {
    public static interface Callback {
        public String matches(MatchResult match);
    }

    private final Pattern pattern;
    private Callback callback;

    private class Result {
        int start;
        int end;
        String replace;
    }

    /**
     * You probably don't need this.  {@see find(String, String, Callback)}
     * @param regex     The string regex to use
     * @param callback  An instance of Callback to execute on matches
     */
    public ReplaceCallback(String regex, final Callback callback) {
        this.pattern = Pattern.compile(regex);
        this.callback = callback;
    }

    public String execute(String string) {
        final Matcher matcher = this.pattern.matcher(string);
        Stack<Result> results = new Stack<Result>();
        while(matcher.find()) {
            final MatchResult matchResult = matcher.toMatchResult();
            Result r = new Result();
            r.replace = callback.matches(matchResult);
            if(r.replace == null)
                continue;
            r.start = matchResult.start();
            r.end = matchResult.end();
            results.push(r);
        }
        // Improve this with a stringbuilder...
        while(!results.empty()) {
            Result r = results.pop();
            string = string.substring(0, r.start) + r.replace + string.substring(r.end);
        }
        return string;
    }

    /**
     * If you wish to reuse the regex multiple times with different callbacks or search strings, you can create a
     * ReplaceCallback directly and use this method to perform the search and replace.
     *
     * @param string    The string we are searching through
     * @param callback  A callback instance that will be applied to the regex match results.
     * @return  The modified search string.
     */
    public String execute(String string, final Callback callback) {
        this.callback = callback;
        return execute(string);
    }

    /**
     * Use this static method to perform your regex search.
     * @param search    The string we are searching through
     * @param regex     The regex to apply to the string
     * @param callback  A callback instance that will be applied to the regex match results.
     * @return  The modified search string.
     */
    public static String find(String search, String regex, Callback callback) {
        ReplaceCallback rc = new ReplaceCallback(regex, callback);
        return rc.execute(search);
    }
}
Mike
  • 8,853
  • 3
  • 35
  • 44
  • I would not use an instance variable to store the callback, but rather pass it as a parameter. Storing it as an instance variable makes your class have unexpected behaviour when called from separate threads at the same time. (The second callback will get matches from the first and second). – jdmichal Jan 28 '09 at 21:23