2

I'm developing in Java, I have the following string:

String duplicates = "Smith, John - Smith, John - Smith, John – Wilson, Peter";

I need to get a new string with no duplicate names.

unique = "Smith, John – Wilson, Peter";

I thought I could use

String unique[] = duplicates.split("-");

Problem with splitting hyphens with commas is that now I have all commas

Smith, John, Smith, John, Smith, John, Wilson, Peter

Any help will be greatly appreciated

Nowhere Man
  • 19,170
  • 9
  • 17
  • 42
Guisselle
  • 23
  • 2
  • 1
    I think you need to split on a comma rather that dash? – pafau k. Jun 15 '20 at 15:37
  • Yeah, I know what you mean however, is there a way to work around with hyphens? – Guisselle Jun 15 '20 at 15:43
  • 1
    Aren't the names the full "Smith, John"? in which case you are getting the right names by doing split('-') ? – Jose Cifuentes Jun 15 '20 at 15:45
  • For example, the following will print the unique names: `String str = "Smith, John - Smith, John - Smith, John - Wilson, Peter";` `Arrays.stream(str.split("-")).map(s->s.trim()).distinct().forEach(System.out::println);` – Jose Cifuentes Jun 15 '20 at 15:51
  • Does this answer your question? [How can I eliminate duplicate words from String in Java?](https://stackoverflow.com/questions/42770863/how-can-i-eliminate-duplicate-words-from-string-in-java) – Arvind Kumar Avinash Jun 24 '20 at 12:12
  • Does this answer your question? [Remove duplicate values from a string in java](https://stackoverflow.com/questions/6790689/remove-duplicate-values-from-a-string-in-java)? – Arvind Kumar Avinash Jun 24 '20 at 12:13

1 Answers1

6

You can use distinct() operation of stream

Arrays.stream(duplicates.split("\\s+(-|–|‒|–|—|―)+\\s+")) // split by different types of dashes surrounded by whitespaces
      .distinct()        // get rid of duplicates
      .collect(Collectors.toList())
      .forEach(System.out::println); // print each entry

Output:

Smith, John
Wilson, Peter

Or use Collectors.joining to get a string without duplicates:

String duplicates = "Smith, John -- Smith, John - Smith, John – Wilson, Peter ‒ Yves Saint-Laurent ― George Henry Lane-Fox Pitt-Rivers";

String noDuplicates = Arrays.stream(duplicates.split("\\s+(-|–|‒|–|—|―)+\\s+"))
                            .distinct()
                            .collect(Collectors.joining(" – "));
System.out.println(noDuplicates);

prints:

Smith, John – Wilson, Peter – Yves Saint-Laurent – George Henry Lane-Fox Pitt-Rivers

I updated the detection of the names which can contain single hyphens to handle "double-barrelled" names which are quite popular, and added types of dashes

Nowhere Man
  • 19,170
  • 9
  • 17
  • 42
  • I do have one quick question.... why using the ("-|–") ? – Guisselle Jun 15 '20 at 16:28
  • in case your input string contains different types of dashes/hyphens - as in your input string there are at least two types: `"-"` and `"–"`. You may check updated input string in my example. – Nowhere Man Jun 15 '20 at 16:37
  • Do I close it? Not sure how – Guisselle Jun 15 '20 at 17:08
  • Meaning? You can accept my answer - put a green tick – Nowhere Man Jun 15 '20 at 17:16
  • Sorry, I'm new to Java, one last question, what is ".collect(Collectors.joining(" – "));" doing? is it joining again with another delimiter after you've removed the original delimiter? – Guisselle Jun 15 '20 at 19:48
  • Yes, according to your requirement: _I need to get a new string with no duplicate names `unique = "Smith, John – Wilson, Peter";`_ – Nowhere Man Jun 15 '20 at 21:05