20

I have a regex pattern that will have only one group. I need to find texts in the input strings that follows the pattern and replace ONLY the match group 1. For example I have the regex pattern and the string to be applied on as shown below. The replacement string is "<---->"

Pattern p = Pattern.compile("\\w*(lan)\\w+");
Matcher m = p.matcher("plan plans lander planitia");

The expected result is

plan p<--->s <--->der p<--->itia

I tried following approaches

    String test = "plan plans lander planitia";
    Pattern p = Pattern.compile("\\w*(lan)\\w+");
    Matcher m = p.matcher(test);
    String result = "";
    while(m.find()){
        result = test.replaceAll(m.group(1),"<--->");
    }
    System.out.print(result);

This gives result as

p<---> p<--->s <--->der p<--->itia

Another approach

    String test = "plan plans lander planitia";
    Pattern p = Pattern.compile("\\w*(lan)\\w+");
    Matcher m = p.matcher(test);
    String result = "";
    while(m.find()){
        result = test.replaceAll("\\w*(lan)\\w+","<--->");
    }
    System.out.print(result);

Result is

plan <---> <---> <--->

I have gone through this link. Here the part of the string before the match is always constant and is "foo" but in my case it varies. Also I have looked at this and this but I am unable to apply any on the solutions given to my present scenario.

Any help is appreciated

Community
  • 1
  • 1
Aditya
  • 913
  • 2
  • 7
  • 18

4 Answers4

40

You need to use the following pattern with capturing groups:

(\w*)lan(\w+)
^-1-^   ^-2-^

and replace with $1<--->$2

See the regex demo

The point is that we use a capturing group around the parts that we want to keep and just match what we want to discard.

Java demo:

String str = "plan plans lander planitia";
System.out.println(str.replaceAll("(\\w*)lan(\\w+)", "$1<--->$2"));
// => plan p<--->s <--->der p<--->itia

If you need to be able to replace the Group 1 and keep the rest, you may use the replace callback method emulation with Matcher#appendReplacement:

String text = "plan plans lander planitia";
String pattern = "\\w*(lan)\\w+";
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(text);
StringBuffer sb = new StringBuffer();
while (m.find()) {
    m.appendReplacement(sb, m.group(0).replaceFirst(Pattern.quote(m.group(1)), "<--->"));
}
m.appendTail(sb); // append the rest of the contents
System.out.println(sb.toString());
// output => plan p<--->s <--->der p<--->itia

See another Java demo

Here, since we process a match by match, we should only replace the Group 1 contents once with replaceFirst, and since we replace the substring as a literal, we should Pattern.quote it.

Christoph
  • 3,980
  • 2
  • 40
  • 41
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • Thank you for the response. The program takes pattern as user input and we ask user to input a pattern with only one group. So we don't have a fixed pattern. Is there a way to do this with out changing the pattern? – Aditya Jul 10 '16 at 21:46
  • I do not think it makes sense, but have a look at http://ideone.com/gWKfH4 – Wiktor Stribiżew Jul 10 '16 at 21:56
  • I am just not sure how the pattern is formed. If you provide more input on what the requirements for a pattern are, I could think of a better approach. – Wiktor Stribiżew Jul 10 '16 at 22:02
  • 1
    @WiktorStribiżew I have suggested a [similar solution](http://stackoverflow.com/a/38296835/5221149) (using `appendReplacement`), but without the need for double-regex. – Andreas Jul 10 '16 at 22:08
  • Too bad your original answer didn't meet OPs incomplete spec, because I like that answer. +1 – Andreas Jul 10 '16 at 22:14
  • 1
    @Wiktor Stribiżew Thank you very much. This is what I am looking for. I think i have to read more about Pattern.quote. I never used it before. Thanks again. – Aditya Jul 10 '16 at 22:22
  • 1
    "The point is that we use a capturing group around the parts that we want to keep and just match what we want to discard". Thanks man. – Eugene Feb 09 '20 at 14:54
4

To dynamically control the replacement value, use a find() loop with appendReplacement(), finalizing the result with appendTail().

That way you have full control of the replacement value. In your case, the pattern is the following, and you can get the positions indicated.

   start(1)
      ↓  end(1)
      ↓    ↓
  \\w*(lan)\\w+
  ↑            ↑
start()      end()

You can then extract the values to keep.

String input = "plan plans lander planitia";

StringBuffer buf = new StringBuffer();
Matcher m = Pattern.compile("\\w*(lan)\\w+").matcher(input);
while (m.find())
    m.appendReplacement(buf, input.substring(m.start(), m.start(1)) +
                             "<--->" +
                             input.substring(m.end(1), m.end()));
String output = m.appendTail(buf).toString();

System.out.println(output);

Output

plan p<--->s <--->der p<--->itia

If you don't like that it uses the original string, you can use the matched substring instead.

StringBuffer buf = new StringBuffer();
Matcher m = Pattern.compile("\\w*(lan)\\w+").matcher("plan plans lander planitia");
while (m.find()) {
    String match = m.group();
    int start = m.start();
    m.appendReplacement(buf, match.substring(0, m.start(1) - start) +
                             "<--->" +
                             match.substring(m.end(1) - start, m.end() - start));
}
String output = m.appendTail(buf).toString();
Andreas
  • 154,647
  • 11
  • 152
  • 247
1

While Wiktors explanation of the use of capturing groups is completely correct, you could avoid using them at all. The \\w* at the start of your pattern seems irrelevant, as you want to keep it anyways, so we can simply leave it out of the pattern. The check for a word-character after lan can be done using a lookahead, like (?=\w), so we actually only match lan in a pattern like "lan(?=\\w)" and can do a simple replace with "<--->" (or whatever you like).

Sebastian Proske
  • 8,255
  • 2
  • 28
  • 37
  • I'd rather point out that using a capturing group is almost always better from the performance point of view. If there are no specific contextual requirements, and no overlapping matches are required, capturing groups are the most efficient way with a regex. – Wiktor Stribiżew Jul 10 '16 at 22:00
  • Good answer. @WiktorStribiżew I don't see any advantage in using capture groups over a lookahead here. At least I'd consider your solution of worse performance (or at most equal performance if you'd use `lan(\w)`) but you can proof the opposite if you like (: – bobble bubble Jul 10 '16 at 22:15
  • @bobblebubble: Here, the difference is negligent. Still, a capturing group is a more readable construct, and some consider that as the most important thing about using regex (I am not among those :)). Anyway, OP does not have much control over the pattern. – Wiktor Stribiżew Jul 10 '16 at 22:26
1

I like others solutions. This is slightly optimalised bulletproof version:

public static void main (String [] args) {
    int groupPosition = 1;
    String replacement = "foo";
    Pattern r = Pattern.compile("foo(bar)");
    Matcher m = r.matcher("bar1234foobar1234bar");
    StringBuffer sb = new StringBuffer();
    while (m.find()) {
        StringBuffer buf = new StringBuffer(m.group());
        buf.replace(m.start(groupPosition)-m.start(), m.end(groupPosition)-m.start(), replacement); 
        m.appendReplacement(sb, buf.toString());
    }
    m.appendTail(sb); 
    System.out.println(sb.toString()); // result is "bar1234foofoo1234bar"
}