9

How can we remove duplicate elements from a list of String without considering the case for each word, for example consider below code snippet

    String str = "Kobe Is is The the best player In in Basketball basketball game .";
    List<String> list = Arrays.asList(str.split("\\s"));
    list.stream().distinct().forEach(s -> System.out.print(s+" "));

This still gives the same output as below, which is obvious

Kobe Is is The the best player In in Basketball basketball game .

I need the result as follows

Kobe Is The best player In Basketball game .
Lino
  • 19,604
  • 6
  • 47
  • 65
pps
  • 103
  • 1
  • 4
  • 1
    Use a TreeSet using the appropriate comparator (https://docs.oracle.com/javase/8/docs/api/java/lang/String.html#CASE_INSENSITIVE_ORDER). Loop through each word, add it to the set, and if it's already there, don't print it. – JB Nizet Jul 27 '18 at 07:19
  • 1
    Are you only looking for consecutive duplicates, or all duplicates? – Robby Cornelissen Jul 27 '18 at 07:29

8 Answers8

13

Taking your question literally, to “remove duplicate strings irrespective of case from a list”, you may use

// just for constructing a sample list
String str = "Kobe Is is The the best player In in Basketball basketball game .";
List<String> list = new ArrayList<>(Arrays.asList(str.split("\\s")));

// the actual operation
TreeSet<String> seen = new TreeSet<>(String.CASE_INSENSITIVE_ORDER);
list.removeIf(s -> !seen.add(s));

// just for debugging
System.out.println(String.join(" ", list));
Holger
  • 285,553
  • 42
  • 434
  • 765
3

In case you only need to get rid of consecutive duplicates, you can use a regular expression. The regex below checks for duplicated words, ignoring case.

String input = "Kobe Is is The the best player In in Basketball basketball game .";
String output = input.replaceAll("(?i)\\b(\\w+)\\s+\\1\\b", "$1");

System.out.println(output);

Which outputs:

Kobe Is The best player In Basketball game .
Robby Cornelissen
  • 91,784
  • 22
  • 134
  • 156
3

Here's a fun solution to get the expected result with the use of streams.

String result = Pattern.compile("\\s")
                .splitAsStream(str)
                .collect(Collectors.collectingAndThen(Collectors.toMap(String::toLowerCase,
                        Function.identity(),
                        (l, r) -> l,
                        LinkedHashMap::new),
                        m -> String.join(" ", m.values())));

prints:

Kobe Is The best player In Basketball game .
Ousmane D.
  • 54,915
  • 8
  • 91
  • 126
1

if it's not a problem for you losing while print all the capital letters, you can do in this way

    list.stream()
            .map(String::toLowerCase)
            .distinct()
            .forEach(System.out::print)

Output:

kobe is the best player in basketball game .

Leviand
  • 2,745
  • 4
  • 29
  • 43
0

Keeping your uppercase and removing lowercase:

String str = "Kobe Is is The the best player In in Basketball basketball game .";
List<String> list = Arrays.asList(str.split("\\s"));
for(int i = 1; i<list.size(); i++)
{
        if(list.get(i).equalsIgnoreCase(list.get(i-1)))
        {
            // is lower case
            if(list.get(i).toLowerCase().equals(list.get(i)))
            {
                list.set(i,"");
            }
            else
            {
                list.set(i-1, "");
            }
        }
}

list.stream().distinct().forEach(s -> System.out.print(s+" "));             
maio290
  • 6,440
  • 1
  • 21
  • 38
0

Here's a one-line solution that:

This solution makes use of the jOOλ library and its Seq.distinct(Function<T,U>) method:

List<String> distinctWords = Seq.seq(list).distinct(String::toLowerCase).toList();

Result (when printed like in the question):

Kobe Is The best player In Basketball game .

Tomasz Linkowski
  • 4,386
  • 23
  • 38
  • 1
    distinct by property can be achieved via this also https://stackoverflow.com/questions/23699371/java-8-distinct-by-property – Eugene Jul 27 '18 at 09:22
  • Nice remark. Incidentally, that's almost exactly how it's implemented in `Seq`, the only difference being that `ConcurrentHashMap.newKeySet()` isn't used there (regular `ConcurrentHashMap` is used). – Tomasz Linkowski Jul 27 '18 at 10:54
0

The problem with the repeating string is that those don't occur in exact same case first word is Basketball and other one is basketball so both those are not the same ones. Capital B is there in first occurance. So what you can do is you can do the comparison of string into either lower case or UPPER CASE or you can do comparison ignoring case.

NirajT
  • 49
  • 5
0

The provided solution with TreeSet is elegant. but TreeSet also sorts the elements which makes the solution inefficient. The code below demonstrates how to implement it more efficiently using HashMap that gives precedence to the string that has more upper case letters

class SetWithIgnoreCase {
    private HashMap<String, String> underlyingMap = new HashMap<>();

    public void put(String str) {
        String lowerCaseStr = str.toLowerCase();
        underlyingMap.compute(lowerCaseStr, (k, v) -> (v == null) ? str : (compare(v, str) > 0 ? v : str));
    }

    private int compare(String str1, String str2) {
        int upperCaseCnt1 = 0;
        int upperCaseCnt2 = 0;
        for (int i = 0; i < str1.length(); i++) {
            upperCaseCnt1 += (Character.isUpperCase(str1.charAt(i)) ? 1 : 0);
            upperCaseCnt2 += (Character.isUpperCase(str2.charAt(i)) ? 1 : 0);
        }
        return upperCaseCnt1 - upperCaseCnt2;
    }
}
Yaniv Levi
  • 456
  • 6
  • 8