-1

I tried this:

str.replaceAll("(.)\\1+", "$1");

But the output I got was AabBbc but b is already came before.

How can I remove the second occurrence of b.


Update
The aim is to remove all the duplicate alphabets after the first occurrence of that alphabet.
Also the uppercase and lowercase alphabets are to be treated differently.

  • 5
    What's the logic? You want to get `Aa`from `Aaa`, but you want `Bbb` to become just `B`. – SamWhan Jun 16 '17 at 12:02
  • 1
    To clarify/update your question use [edit] option. – Pshemo Jun 16 '17 at 12:05
  • 3
    If I understand correctly, you want to remove all characters that have already occurred earlier in the string? If so, this really isn't a task for regex. – Aran-Fey Jun 16 '17 at 12:06
  • 4
    `str.replace("AaabBbbc","AabBc")` meets the requirement you've stated. This is a lesson in stating your requirement adequately. None of us know what rule you actually want to apply. – slim Jun 16 '17 at 12:16
  • 1
    can you tell us the X problem instead of Y(see [the XY problem](https://meta.stackexchange.com/questions/66377/what-is-the-xy-problem)) – niceman Jun 16 '17 at 12:19
  • 1
    Why use a regex ? Maybe an ordered set would be enough (e.g. LinkedHashSet) ? – orion78fr Jun 16 '17 at 12:25
  • Is the answer to your actual problem already posted [here](https://stackoverflow.com/a/34730085/7756856)? – Imus Jun 16 '17 at 13:18
  • updated the question – Kartik Patodi Jun 16 '17 at 13:37
  • 1
    @KartikPatodi Could you add some demo cases? Some that should match, and some that shouldn't. – Olian04 Jun 16 '17 at 13:58
  • Test Cases ==> Banana -> Ban || Semester -> Semstr || AaaaaaaaAaaaaa _-> Aa – Kartik Patodi Jun 16 '17 at 14:45
  • What's the purpose? Would it be OK if the last instance is kept, instead of the first I.e. `Banana` -> `Bna`, `Semester` -> `Smster`. – SamWhan Jun 16 '17 at 14:55
  • @KartikPatodi What you just described is just removing [duplicated letters](https://stackoverflow.com/questions/4989091/removing-duplicates-from-a-string-in-java), which really shouldn't be attempted with regex. – Olian04 Jun 16 '17 at 15:22

1 Answers1

1

The best, although still weird, solution using regex that I could come up with so far is below.

If I can figure out a solution that doesn't require to reverse the string, I'll update this.

RegexTest.java

import java.lang.StringBuilder;

public class RegexTest {

    public static String removeDupeLetters(String input) {
        // Because look-behinds have to be fixed width, we have to use look-aheads instead.
        // As a result of that we'll have to reverse the string and start from the end. (And then reverse the result again)
        return reverseString(reverseString(input).replaceAll("([A-Za-z])(?=.*\\1)", ""));
    }

    // helper function for reversing a String
    public static String reverseString(String input) {
        return new StringBuilder(input).reverse().toString();
    }

    public static void main(String[] args) {
        final String[] inputs = {"AaabBbc", "Banana", "Semester", "AaaaaaaaAaaaaa"};
        for (String input : inputs) {
            System.out.println("Input:  " + input);
            System.out.println("Output: " + removeDupeLetters(input));
            System.out.println();
        }
    }
}

javac RegexTest.java && java RegexTest

Input:  AaabBbc
Output: AabBc

Input:  Banana
Output: Ban

Input:  Semester
Output: Semstr

Input:  AaaaaaaaAaaaaa
Output: Aa

Note: As pointed out in the comments, it probably makes more sense to use a solution that does not involve regex at all, see link... was fun anyway though :D

Jay
  • 3,640
  • 12
  • 17