-5

I have a big list of sentences, some of them are similar to each other but a bit different. something like:

[word1] [word2] [word3]

[word1] [word3]

[word1] [word2] [word3] [word4]

I would like to delete "duplicates" and get only one sentence. Just asking if it is possible in java?

Skillzone
  • 1
  • 3
  • 1
    Not sure I'm following. What is the expected output? – Mureinik Nov 04 '16 at 10:57
  • I have list of 10k sentences similar do each other and I would like to get ~1k without duplicates(some of them have 5 copies, some 20) one for each sentence – Skillzone Nov 04 '16 at 11:01

3 Answers3

0

you can do it like this

for (int i = 0; i < words.length; i++)
{
    for (int j = 0; j < words.length; j++)
    {
         if (words[i].equals(words[j]))
         {
         if (i != j)
         words[i] = "";

         }
     }
}
  • Better approach is to create a new array with the result instead of overwriting the current array with empty string – B001ᛦ Nov 04 '16 at 11:03
0

add the list to a Set.. set will not have duplicates.. refer below code..

    List<String> collectionWithDuplicates = new ArrayList<>();

    Set<String> collectionWithoutDuplicates = new HashSet<>();

    collectionWithoutDuplicates.addAll(collectionWithDuplicates);
Jobin
  • 5,610
  • 5
  • 38
  • 53
0

In Java 8

 List<String> newList = oldList.stream().distinct().collect(Collectors.toList());
Eritrean
  • 15,851
  • 3
  • 22
  • 28