Java 8 remove duplicate strings irrespective of case from a list

Question

How can we remove duplicate elements from a list of String without considering the case for each word, for example consider below code snippet

    String str = "Kobe Is is The the best player In in Basketball basketball game .";
    List<String> list = Arrays.asList(str.split("\\s"));
    list.stream().distinct().forEach(s -> System.out.print(s+" "));

This still gives the same output as below, which is obvious

Kobe Is is The the best player In in Basketball basketball game .

I need the result as follows

Kobe Is The best player In Basketball game .

Use a TreeSet using the appropriate comparator (https://docs.oracle.com/javase/8/docs/api/java/lang/String.html#CASE_INSENSITIVE_ORDER). Loop through each word, add it to the set, and if it's already there, don't print it. — JB Nizet, Jul 27 '18 at 07:19
Are you only looking for consecutive duplicates, or all duplicates? — Robby Cornelissen, Jul 27 '18 at 07:29

score 13 · Accepted Answer · answered Jul 27 '18 at 08:55

Taking your question literally, to “remove duplicate strings irrespective of case from a list”, you may use

// just for constructing a sample list
String str = "Kobe Is is The the best player In in Basketball basketball game .";
List<String> list = new ArrayList<>(Arrays.asList(str.split("\\s")));

// the actual operation
TreeSet<String> seen = new TreeSet<>(String.CASE_INSENSITIVE_ORDER);
list.removeIf(s -> !seen.add(s));

// just for debugging
System.out.println(String.join(" ", list));

Robby Cornelissen · Answer 2 · 2018-07-27T07:48:42.987

3

In case you only need to get rid of consecutive duplicates, you can use a regular expression. The regex below checks for duplicated words, ignoring case.

String input = "Kobe Is is The the best player In in Basketball basketball game .";
String output = input.replaceAll("(?i)\\b(\\w+)\\s+\\1\\b", "$1");

System.out.println(output);

Which outputs:

Kobe Is The best player In Basketball game .

edited Jul 27 '18 at 07:48

answered Jul 27 '18 at 07:24

Robby Cornelissen

91,784
22
134
156

When using a `java.util.regex.Pattern` one could omit the `(?i)` and write `Pattern.compile("\\b(\\w+)\\s+\\1\\b", Pattern.CASE_INSENSITIVE)` – Lino Jul 27 '18 at 08:00
@Lino True, but `String.replaceAll()` doesn't accept `Pattern` parameters. – Robby Cornelissen Jul 27 '18 at 08:02
1

`myPattern.matcher(input).replaceAll("$1")` should work though – Lino Jul 27 '18 at 08:03

score 3 · Answer 3 · answered Jul 27 '18 at 08:44

Here's a fun solution to get the expected result with the use of streams.

String result = Pattern.compile("\\s")
                .splitAsStream(str)
                .collect(Collectors.collectingAndThen(Collectors.toMap(String::toLowerCase,
                        Function.identity(),
                        (l, r) -> l,
                        LinkedHashMap::new),
                        m -> String.join(" ", m.values())));

prints:

Kobe Is The best player In Basketball game .

score 1 · Answer 4 · answered Jul 27 '18 at 07:17

1

if it's not a problem for you losing while print all the capital letters, you can do in this way

    list.stream()
            .map(String::toLowerCase)
            .distinct()
            .forEach(System.out::print)

Output:

kobe is the best player in basketball game .

answered Jul 27 '18 at 07:17

Leviand

2,745
4
29
43

1

it seems OP doesn't want all lower case . – soorapadman Jul 27 '18 at 07:30

score 0 · Answer 5 · answered Jul 27 '18 at 07:26

Keeping your uppercase and removing lowercase:

String str = "Kobe Is is The the best player In in Basketball basketball game .";
List<String> list = Arrays.asList(str.split("\\s"));
for(int i = 1; i<list.size(); i++)
{
        if(list.get(i).equalsIgnoreCase(list.get(i-1)))
        {
            // is lower case
            if(list.get(i).toLowerCase().equals(list.get(i)))
            {
                list.set(i,"");
            }
            else
            {
                list.set(i-1, "");
            }
        }
}

list.stream().distinct().forEach(s -> System.out.print(s+" "));

Tomasz Linkowski · Answer 6 · 2018-07-27T11:02:56.117

0

Here's a one-line solution that:

filters out all (including non-consecutive) case-insensitive duplicates (unlike Robby's solution and maio290's solution)
preserves original case (unlike Leviand's solution)

This solution makes use of the jOOλ library and its Seq.distinct(Function<T,U>) method:

List<String> distinctWords = Seq.seq(list).distinct(String::toLowerCase).toList();

Result (when printed like in the question):

Kobe Is The best player In Basketball game .

edited Jul 27 '18 at 11:02

answered Jul 27 '18 at 08:23

Tomasz Linkowski

4,386
23
38

1

distinct by property can be achieved via this also https://stackoverflow.com/questions/23699371/java-8-distinct-by-property – Eugene Jul 27 '18 at 09:22
Nice remark. Incidentally, that's almost exactly how it's implemented in `Seq`, the only difference being that `ConcurrentHashMap.newKeySet()` isn't used there (regular `ConcurrentHashMap` is used). – Tomasz Linkowski Jul 27 '18 at 10:54

score 0 · Answer 7 · answered Jul 27 '18 at 08:51

The problem with the repeating string is that those don't occur in exact same case first word is Basketball and other one is basketball so both those are not the same ones. Capital B is there in first occurance. So what you can do is you can do the comparison of string into either lower case or UPPER CASE or you can do comparison ignoring case.

Yaniv Levi · Answer 8 · 2019-08-13T09:45:41.413

The provided solution with TreeSet is elegant. but TreeSet also sorts the elements which makes the solution inefficient. The code below demonstrates how to implement it more efficiently using HashMap that gives precedence to the string that has more upper case letters

class SetWithIgnoreCase {
    private HashMap<String, String> underlyingMap = new HashMap<>();

    public void put(String str) {
        String lowerCaseStr = str.toLowerCase();
        underlyingMap.compute(lowerCaseStr, (k, v) -> (v == null) ? str : (compare(v, str) > 0 ? v : str));
    }

    private int compare(String str1, String str2) {
        int upperCaseCnt1 = 0;
        int upperCaseCnt2 = 0;
        for (int i = 0; i < str1.length(); i++) {
            upperCaseCnt1 += (Character.isUpperCase(str1.charAt(i)) ? 1 : 0);
            upperCaseCnt2 += (Character.isUpperCase(str2.charAt(i)) ? 1 : 0);
        }
        return upperCaseCnt1 - upperCaseCnt2;
    }
}

Java 8 remove duplicate strings irrespective of case from a list

8 Answers8

Linked