6

I have searched a lot but I am unable to find a regex that could select only single alphabets and double them while those alphabets which are already double, should remain untouched.

I tried

String str = "yahoo";
str = str.replaceAll("(\\w)\\1+", "$0$0");

But since this (\\w)\\1+ selects all double elements, my output becomes yahoooo. I tried to add negation to it !(\\w)\\1+ but didn't work and output becomes same as input. I have tried

str.replaceAll(".", "$0$0");

But that doubles every character including which are already doubled.

Please help to write an regex that could replace all single character with double while double character should remain untouched.

Example

abc -> aabbcc
yahoo -> yyaahhoo (o should remain untouched)
opinion -> ooppiinniioonn
aaaaaabc -> aaaaaabbcc
Eatsam ul haq
  • 317
  • 1
  • 12

3 Answers3

4

You can match using this regex:

((.)\2+)|(.)

And replace it with:

$1$3$3

RegEx Demo

RegEx Explanation:

  • ((.)\2+): Match a character and capture in group #2 and using \2+ next to it to make sure we match all multiple repeats of captured character. Capture all the repeated characters in group #1
  • |: OR
  • (.): Match any character and capture in group #3

Code Demo:

import java.util.List;
 
class Ideone {
 
    public static void main(String[] args) {
        List<String> input = List.of("aaa", "abc", "yahoo",
                "opinion", "aaaaaabc");
 
        for (String s: input) {
            System.out.println( s + " => " +
                  s.replaceAll("((.)\\2+)|(.)", "$1$3$3") );
        }
    }
}

Output:

aaa => aaa
abc => aabbcc
yahoo => yyaahhoo
opinion => ooppiinniioonn
aaaaaabc => aaaaaabbcc
anubhava
  • 761,203
  • 64
  • 569
  • 643
2

The solution by @anubhava, if viable in Java, is probably the best way to go. For a more brute force approach, we can try a regex iteration approach on the following pattern:

(\\w)\\1+|\\w

This matches, eagerly, a series of similar letters (two or more of them), followed by, that failing, a single letter. For each match, we can no-op on the multi-letter match, and double up any other single letter. Here is a short Java code which does this:

List<String> inputs = Arrays.asList(new String[] {"abc", "yahoo", "opinion", "aaaaaabc"});
String pattern = "(\\w)\\1+|\\w";
Pattern r = Pattern.compile(pattern);

for (String input : inputs) {
    Matcher m = r.matcher(input);
    StringBuffer buffer = new StringBuffer();
    while (m.find()) {
        if (m.group().matches("(\\w)\\1+")) {
            m.appendReplacement(buffer, m.group());
            }
            else {
                m.appendReplacement(buffer, m.group() + m.group());
            }
        }
        m.appendTail(buffer);
        System.out.println(input + " => " + buffer.toString());
    }
}

This prints:

abc => aabbcc
yahoo => yyaahhoo
opinion => ooppiinniioonn
aaaaaabc => aaaaaabbcc
Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
2

I've got two different understandings of the question.

  1. If the goal is to get an even amount of each word character:
    Search for (\w)\1? and replace with $1$1 (regex101 demo).

  2. If just solely characters should be duplicated and others left untouched:
    Search for (\w)\1?(\1*) and replace with $1$1$2 (regex 101 demo).

Captures a word character \w to $1, optionally matches the same character again. The second variant captures any more of the same character to $2 for attaching in the replacement.

FYI: If using as a Java string remember to escape the pattern. E.g. \1 -> \\1, \w ->\\w, ...

bobble bubble
  • 16,888
  • 3
  • 27
  • 46
  • 2
    That is also nifty, matching the optional backreferenc to `\1` in group 2 so it can not be evautated anymore :-) I was playing around with lookarounds, but that would just complicate things https://regex101.com/r/yz1hGy/1 – The fourth bird Aug 12 '22 at 18:44
  • 1
    @Thefourthbird Thank you! I'm still unsure about the exact requirements, so I put two options. First I thought only of the second variant. Maybe it clears up, let's see :) – bobble bubble Aug 12 '22 at 18:50