11

If I have a regex with a capturing group, e.g. foo(_+f). If I match this against a string and want to replace the first capturing group in all matches with baz so that

foo___f blah foo________f

is converted to:

foobaz blah foobaz

There doesn't appear to be any easy way to do this using the standard libraries. If I use Matcher.replaceAll() this will replace all matches of the entire pattern and convert the string to

baz blah baz

Obviously I can just iterate through the matches, store the start and end index of each capturing group, then go back and replace them, but is there an easier way?

Thanks, Don

Dónal
  • 185,044
  • 174
  • 569
  • 824
  • Pretty sure I misunderstood the question, because I immediatly thought of using Matcher.replaceFirst instead of replaceAll ...!? – Andreas Dolk May 27 '10 at 12:59

4 Answers4

25

I think you want something like this?

    System.out.println(
        "foo__f blah foo___f boo___f".replaceAll("(?<=foo)_+f", "baz")
    ); // prints "foobaz blah foobaz boo___f"

Here you simply replace the entire match with "baz", but the match uses lookbehind to ensure that _+f is preceded by foo.

See also


If lookbehind is not possible (perhaps because the length is not finite), then simply capture even what you're NOT replacing, and refer to them back in the replacement string.

    System.out.println(
        "fooooo_f boooo_f xxx_f".replaceAll("(fo+|bo+)(_+f)", "$1baz")
    ); // prints "fooooobaz boooobaz xxx_f"

So here we're effectively only replacing what \2 matches.

polygenelubricants
  • 376,812
  • 128
  • 561
  • 623
  • 1
    Nice answer, but OP seems to have edited the matching pattern however (`g` is removed). This changes the view on the problem pretty much. I suggest to update your answer accordingly. – BalusC May 27 '10 at 13:20
  • 2
    The second suggestion is simple, effective, something I should have thought of myself, and doesn't require me to learn about lookarounds :) – Dónal May 27 '10 at 13:21
  • Yes, I updated the pattern in the question in an attempt to clarify. Sorry if it messed up your response. – Dónal May 27 '10 at 13:22
  • @Don: lookarounds are awesome, see e.g: http://stackoverflow.com/questions/2559759/how-do-i-convert-camelcase-into-human-readable-names-in-java – polygenelubricants May 27 '10 at 13:43
4

So I don't think any of these answers do justice to more abstract cases of the following question, which is something I ran into myself, so I wrote some code that works in the more general case:

/**
 * 
 * @param regex  Pattern to find in oldLine. Will replace contents in ( ... ) - group(1) - with newValue
 * @param oldLine  Previous String that needs replacing
 * @param newValue  Value that will replace the captured group(1) in regex
 * @return
 */
public static String replace(String regex, String oldLine, String newValue)
{
    Pattern p = Pattern.compile(regex);
    Matcher m = p.matcher(oldLine);
    if (m.find())
    {
        return m.replaceAll(replaceGroup(regex, newValue));
    }
    else
    {
        throw new RuntimeException("No match");
    }
}

/**
 * Replaces group(1) ( ... ) with replacement, and returns the resulting regex with replacement String
 * @param regex  Regular expression whose parenthetical group will be literally replaced by replacement
 * @param replacement  Replacement String
 * @return
 */
public static String replaceGroup(String regex, String replacement)
{
    return regex.replaceAll("\\(.*\\)", replacement);
}

On your example, it does precisely as you describe:

String regex = "foo(_+f)";
String line = "foo___f blah foo________f";
System.out.println(FileParsing.replace(regex, line, "baz"));

Prints out:

foobaz blah foobaz
Bryce Sandlund
  • 487
  • 2
  • 5
  • 17
1
p = Pattern.compile("foo(g.*?f)");
m = p.matcher("foog___f blah foog________f");
s = m.replaceAll("foobaz");//replace with foobaz instead of just baz
System.out.println(s);//foobaz blah foobaz
Amarghosh
  • 58,710
  • 11
  • 92
  • 121
  • No, I'm trying to replace the capturing groups in all matches – Dónal May 27 '10 at 13:07
  • 1
    Which is what Amarghosh's snippet will do. While "foo" is being matched, it also is being included in the replacement string, meaning any instances like foo_f, foo____f, foo__f, etc., become foobaz. – JAB May 27 '10 at 13:21
  • @Don updated the code for you to test. As @JAB mentioned, I've included foo in the replacement string too. And the original regex you posted was greedy, and your question was not clear enough - that's why I asked if you were looking for the lazy quantifier. – Amarghosh May 27 '10 at 13:35
0

Is this anywhere close ....

String[] s = {"foo___f blah foo________f", 
    "foo___f blah goo________f"};
for(String ss: s)
System.out.println(ss.replaceAll("(foo)(_+)f", "$1baz"));

Ie, add a capturing group for 'foo' also. Otherwise a simple replacement would be

"foo___f blah foo________f".replaceAll("(_+)f", "baz")
Kennet
  • 5,736
  • 2
  • 25
  • 24