4

I need to replace a repeated pattern within a word with each basic construct unit. For example I have the string "TATATATA" and I want to replace it with "TA". Also I would probably replace more than 2 repetitions to avoid replacing normal words.

I am trying to do it in Java with replaceAll method.

Michael
  • 791
  • 2
  • 12
  • 32
  • 3
    Where is your code / regex ? – TheLostMind Jun 03 '14 at 06:45
  • If you are already using replaceAll(), what is the problem? replaceAll does exactly what you want to to do (if you have a simple pattern as the one described above). – Pphoenix Jun 03 '14 at 06:46
  • I don't want to replace only "TATATATA" but any string that is repeating like that. @TheLostMind I don't have any regex ideas. – Michael Jun 03 '14 at 06:48
  • Please show your code. – Sanjeev Jun 03 '14 at 06:48
  • 1
    Do you mean any repeating characters like TATATATA or YTYTYT or ABCABCABC which may be present in a string ? – Kenneth Clark Jun 03 '14 at 06:49
  • I would like to know if there is a way to match any repeating patterns. What is that thing with my code? I don't have any code this is why I am asking. – Michael Jun 03 '14 at 06:51
  • Look Ahead, Look back with Regex, String#contains(), String#indexOf() etc can be used for this.. But show us your effort – TheLostMind Jun 03 '14 at 06:51
  • Guys, we are talking about an one-liner probably. Either you know it or you don't. I don't see how that "show the code" is helping people. Thanks. – Michael Jun 03 '14 at 06:52
  • @ddmichael: I do not think it is possible to write code that finds *any repeating pattern*. Technically all text is then a repeating pattern, only that it might just be repeated once. – Pphoenix Jun 03 '14 at 06:56
  • @Pphoenix: I was afraid of that. I am looking for patterns within a word though, do you think that's possible? – Michael Jun 03 '14 at 06:58
  • @ddmichael: Yes! You only have define what you see as a pattern. Example: If you have TAT, is that a pattern where every other letter is a T? Or are you looking for at least TATA? If you find a repeating pattern you would like to find, it will be possible to define code for it. – Pphoenix Jun 03 '14 at 07:00
  • @Pphoenix: No I am not interested in TAT, where every other letter is a T. Ideally I would like to replace A{3,} with A, TA{3,} with TAT{3,} and so on. Hence repeating patterns of one or more continuous characters. – Michael Jun 03 '14 at 07:07
  • 1
    see my answer, i think i got it – MightyPork Jun 03 '14 at 07:20

3 Answers3

9

I think you want this (works for any length of the repeated string):

String result = source.replaceAll("(.+)\\1+", "$1")

Or alternatively, to prioritize shorter matches:

String result = source.replaceAll("(.+?)\\1+", "$1")

It matches first a group of letters, and then it again (using back-reference within the match pattern itself). I tried it and it seems to do the trick.


Example

String source = "HEY HEY duuuuuuude what'''s up? Trololololo yeye .0.0.0";

System.out.println(source.replaceAll("(.+?)\\1+", "$1"));

// HEY dude what's up? Trolo ye .0
MightyPork
  • 18,270
  • 10
  • 79
  • 133
1

You had better use a Pattern here than .replaceAll(). For instance:

private static final Pattern PATTERN 
    = Pattern.compile("\\b([A-Z]{2,}?)\\1+\\b");

//...

final Matcher m = PATTERN.matcher(input);
ret = m.replaceAll("$1");

edit: example:

public static void main(final String... args)
{
    System.out.println("TATATA GHRGHRGHRGHR"
        .replaceAll("\\b([A-Za-z]{2,}?)\\1+\\b", "$1"));
}

This prints:

TA GHR
fge
  • 119,121
  • 33
  • 254
  • 329
1

Since you asked for a regex solution:

(\\w)(\\w)(\\1\\2){2,};

(\w)(\w): matches every pair of consecutive word characters ((.)(.) will catch every consecutive pair of characters of any type), storing them in capturing groups 1 and 2. (\\1\\2) matches anytime the characters in those groups are repeated again immediately afterward, and {2,} matches when it repeats two or more times ({2,10} would match when it repeats more than one but less than ten times).

String s = "hello TATATATA world";    
Pattern p = Pattern.compile("(\\w)(\\w)(\\1\\2){2,}");
Matcher m = p.matcher(s);
while (m.find()) System.out.println(m.group());
    //prints "TATATATA"
drew moore
  • 31,565
  • 17
  • 75
  • 112