Remove duplicated characters from String using regex keeping first occurances

Question

I know how to remove duplicated characters from a String and keeping the first occurrences without regex:

String method(String s){
  String result = "";
  for(char c : s.toCharArray()){
    result += result.contains(c+"")
     ? ""
     : c;
  }
  return result;
}

// Example input: "Type unique chars!"
// Output:        "Type uniqchars!"

I know how to remove duplicated characters from a String and keeping the last occurrences with regex:

String method(String s){
  return s.replaceAll("(.)(?=.*\\1)", "");
}

// Example input: "Type unique chars!"
// Output:        "Typnique chars!"

As for my question: Is it possible, with a regex, to remove duplicated characters from a String, but keep the first occurrences instead of the last?

As for why I'm asking: I came across this codegolf answer using the following function (based on the first example above):

String f(char[]s){String t="";for(char c:s)t+=t.contains(c+"")?"":c;return t;}

and I was wondering if this can be done shorter with a regex and String input. But even if it's longer, I'm just curious in general if it's possible to remove duplicated characters from a String with a regex, while keeping the first occurrences of each character.

I can only suggest reversing a string, [`String g(StringBuilder s){return new StringBuilder(s.reverse().toString().replaceAll("(?s)(.)(?=.*\\1)", "")).reverse().toString();}`](https://ideone.com/9B7vIj). — Wiktor Stribiżew, Mar 23 '17 at 12:03
@WiktorStribiżew Hmm, that's a smart approach. Start with the reversed String, use the regex, and revert it back again. I guess using the for-loop with characters is shorter though, but your function is still a nice approach. With some code-golfing it's 110 bytes: [`String h(StringBuffer s){return""+new StringBuffer((s.reverse()+"").replaceAll("(.)(?=.*\\1)","")).reverse();}`](https://ideone.com/1elfz0) — Kevin Cruijssen, Mar 23 '17 at 15:57

score 1 · Accepted Answer · edited Jun 03 '19 at 14:08

It is not the shortest option, and does not only involve a regex, but still an option. You may reverse the string before running the regex you have and then reverse the result back.

public static String g(StringBuilder s){
  return new StringBuilder(
   s.reverse().toString()
     .replaceAll("(?s)(.)(?=.*\\1)", ""))
     .reverse().toString();
}

See the online Java demo

Note I suggest adding (?s) (= Pattern.DOTALL inline modifier flag) to the regex so as . could match any symbol including a newline (a . does not match all line breaks by default).

Remove duplicated characters from String using regex keeping first occurances

1 Answers1