Remove duplicates from a list of String Array

Question

I know there's a lot of subject about "removing duplicates of a list". I liked the solution with HashSet. However, what I have is an list of String[], and it won't work with it. Probably because stringArray1.equals(stringArray2) will return false even if the two stringArray are the same; to compare string Array, we have to use Arrays.equals, which is not the case with HashSet.

So i have an userList of String[] users with only 2 strings in it: username, and userID. Since both are linked (there's only one userID per username), it would be enough for me to compare only one of those strings.

What I need is a fast way to remove duplicates from the list.

I thought about something like this:

List<String> userNamesList = new ArrayList<String>();
List<String[]> userListWithoutDuplicates = new ArrayList<String[]>();
for(String[] user : userList){
    if(!userNamesList.contains(user[0])){
        userNamesList.add(user[0]);
        userListWithoutDuplicates.add(user);
    }
}

However, this need two new List and a loop (I'm pretty sure any other solution would need this loop, still).

I'm wondering if there's not a better solution. I thought something like that should already be implemented somewhere.

EDIT: I got my array from an sql query. In fact, i have a DB and some users. One user will search for others users responding to certain conditions in DB, DB send back a list of String[] {username, userID} to this user. So i already have an user class, which contains far more than only username and ID. I have one instance of this class per connected user, but the DB can't access those instances, so she can't send it. I thought a String array was an easy solution. I didn't thought that, in certain cases, an user can be referenced more than one time in DB and so selected more than one time. That's why i got duplicates in my list.

you should turn the arrays into objects with 2 fields instead and have them override `equals()` and `hashcode()` — Jack Flamp, Sep 03 '18 at 11:08
i'm using java 10. And i got my array from an sql query. I'll edit post to explain that better. — Ablia, Sep 03 '18 at 11:14
@Abila yes I understand but you can still turn them into objects when you are retrieving the data probably.. how are you accessing your DB? — Jack Flamp, Sep 03 '18 at 11:18
I am guessing you require a collection of string such that the string is unique,In this case you should prefer set. and simply insert these into Set which will take care of implementation, as well as easy to use. What is expected time complexity and space complexity required, according to which you should pick your data structures. — Pavan Kate, Sep 03 '18 at 11:21
@JackFlamp yeah, i guess i can still turn them into object. But then, when the user will search for others information, the DB will send back another list of String[], for example {userID, filename}. So i'll need another object again... Or i can maybe create a class DBResponse, which contains all necessary fields and maybe different constructor for each case...this could work, i guess. — Ablia, Sep 03 '18 at 11:25
@Ablia yes that would be best. `DBUser` or something like that would be better. Java is made for working with objects and it is much easier to perform tasks like the one you are up to whan having lists of lists.. You can of course also limit your responses from your database so that you don't get duplicates. — Jack Flamp, Sep 03 '18 at 11:28
using names instead of IDs for the uniqueness test seems a tad backwards. then the ID for duplicate names will be arbitrary? — Patrick Parker, Sep 03 '18 at 11:41
my advice is to close this question and open a new one with your SQL query asking how to eliminate duplicates... it seems more appropriate to do it at that stage. — Patrick Parker, Sep 03 '18 at 11:43
This is unclear, you provide a bit of code but this isn't including the value you have and the output you expect. What `userList` is doing here ? A [mcve] should be provide to get exactly what you want to do with a `List`. — AxelH, Sep 03 '18 at 11:47

score 2 · Answer 1 · answered Sep 03 '18 at 11:08

2

If you are using Java 8 you can use stream

String[] arrWithDuplicates = new String[]{"John", "John", "Mary", "Paul"};
String[] arrWithoutDuplicates = Arrays.stream(arrWithDuplicates).distinct().toArray(String[]::new);

In arrWithoutDuplicates you'll have "John", "Mary" and "Paul"

answered Sep 03 '18 at 11:08

matt

118
1
9

1

He has a list of arrays – Jack Flamp Sep 03 '18 at 11:09
So he can use `flatMap` function, for example `list.stream().flatMap(Arrays::stream).distinct().collect(Collectors.toList());` – matt Sep 03 '18 at 11:14
no he can't because what would create a stream with both username and userid strings. I would put them into an object. It is Java after all :) – Jack Flamp Sep 03 '18 at 11:20
you're right, creating `User` class with proper `equals()` and `hashCode()` methods will be the best solution from "clean code" point of view, combined with using streams to remove duplicates from a collection of users or using `Set` – matt Sep 03 '18 at 11:25

LuCio · Answer 2 · 2018-09-04T05:56:54.310

The best approach would be to map every user returned from the DB to an object with the two mentioned strings username and userID. Then hashCode and equals should be implemented according to your defintion of equality/duplicate. Based on this there are many ways to get rid of duplicates. You could add all found users to a Set or stream over a list of such users and call Stream.distinct() to reduce the users to unique ones:

List<User> distinctUsers = users.stream().distinct().collect(Collectors.toList());

If you need to go on with the current structure, you cannot use Stream.distinct() as it would compare string arrays by their object identity. The equality has to be specified explcitly. We can do this e.g. in the following way:

Function<String[], String> comparingBy = user -> user[1]; // user[1] = ID
List<String[]> distinctUsers = users.stream()
        .collect(Collectors.groupingBy(comparingBy))
        .values().stream()
        .map(u -> u.get(0))
        .collect(Collectors.toList());

This will group all users by the Function comapringBy. comapringBy should reflect your definition of equality, thus one from two equal users is a duplicate. According to Stream.distinct "the element appearing first in the encounter order is preserved". The result is a distinct list, a list without duplicates.

Another data type would be the mentioned Set. When creating a TreeSet it's also possible to provide the definition of equality explicitly. We can use the same comapringBy as above:

Set<String[]> distinctUsers = new TreeSet<>(Comparator.comparing(comparingBy));
distinctUsers.addAll(users);

This is completely correct.+ for Set with comparator – Pavan Kate Sep 03 '18 at 15:04 — Pavan Kate, Sep 03 '18 at 15:04

Oleg Zinoviev · Answer 3 · 2018-09-03T11:40:57.967

0

Edited: converted userNamesList to HashSet, thanks @Aris_Kortex. This can reduce complecity from O(n^2) to O(n), because complecity of searching in HashSet is O(1).

    Set<String> userSet = new HashSet<>(userNamesList);
    List<String[]> userListWithoutDuplicates = userList.stream()
        .filter(user -> !userSet.contains(user[0]))
        .collect(Collectors.toList());

distinct() on stream does not help as it remove all duplicates from stream: in this case it removes duplicates of arrays where 0th and 1st elements are equal to corresponding elements from other array.

But as I understand, TC would like to remove only those users who has names(0th element) containing in some predefined list.

edited Sep 03 '18 at 11:40

answered Sep 03 '18 at 11:13

Oleg Zinoviev

549
3
14

Not very optimal as a this will effectively re-iterate the whole list for any given item of the Stream of strings. – akortex Sep 03 '18 at 11:28
Can be optimized a little with conversion userNamesList to HashSet before stream – Oleg Zinoviev Sep 03 '18 at 11:30
Maybe, but I do not see any reason why not use a `distinct()` – akortex Sep 03 '18 at 11:31
2

While this code snippet may solve the question, [including an explanation](http://meta.stackexchange.com/questions/114762/explaining-entirely-code-based-answers) really helps to improve the quality of your post. Remember that you are answering the question for readers in the future, and those people might not know the reasons for your code suggestion. – Clijsters Sep 03 '18 at 11:33
I said it was enough to remove according only to the names (0th elements), since if the names are the sames, the ID(1th element) will be the sames too. So distinct() may work as well. I've never used this stream() so i'll get a look at it. – Ablia Sep 03 '18 at 11:43
@Ablia, distinct would not work as it only removes duplicates in collection on which stream was invoked. If you need to check if name is not present in some other collection, you should use filter – Oleg Zinoviev Sep 03 '18 at 11:48

Pavan Kate · Answer 4 · 2018-09-03T11:41:18.207

0

I certainly think that you should use a Set rather than a list in first place. We can modify this according to your time and space complexity,Here is a simple 2 line answer to your code.

        Set set = new HashSet(userNamesList);
        List<String> list = new ArrayList(set);

A working example is run here : https://ideone.com/JznZCE It really depends on what you need to achieve,and if your users are unique, you should simply get a set rather than a list, Also if instead of "String",the info is contained in user object, the order of users need not be changed by this and can be implemented to put users by id or name later.

You can then change how equals is compared by overriding Equals and hashcode method of User Class to use custom implementation to compare.

Hope this helps!

Edit: If source of info is coming from DB,See how you can get a unique list by use of "DISTINCT" keyword (similar mysql construct) , to handle this logic away from your code.

edited Sep 03 '18 at 11:41

answered Sep 03 '18 at 11:36

Pavan Kate

84
1
8

See the second phrase of my post. – Ablia Sep 03 '18 at 11:40
@Ablia You need to handle the comparison logic in your custom equals and hashcode method overriding the default. – Pavan Kate Sep 03 '18 at 11:42
yeah, that's waht i thought too, however i have no idea how to do that. I mean, overriding default code of a java Class. But i'm looking into it. – Ablia Sep 03 '18 at 11:45

score 0 · Accepted Answer · edited Jun 20 '20 at 09:12

You can use the toMap collector to provide a custom keyMapper function which serves as a uniqueness test, then simply use the values of the map as your result.

For your uniqueness test, I think it makes more sense to use index 1 (the userID) instead of index 0 (the userName). However, if you wish to change it back, use arr[0] instead of arr[1] below:

List<String[]> userList = new ArrayList<>();
userList.add(new String[]{"George","123"});
userList.add(new String[]{"George","123"});
userList.add(new String[]{"George","456"});
List<String[]> userListNoDupes = new ArrayList<>(userList.stream()
    .collect(Collectors.toMap(arr-> arr[1], Function.identity(), (a,b)-> a)).values());
for(String[] user: userListNoDupes) {
    System.out.println(Arrays.toString(user));
}

Output:

[George, 123]

[George, 456]

This work and avoid using another List with only names in it. Thank you. — Ablia, Sep 03 '18 at 12:43

score -1 · Answer 6 · answered Sep 03 '18 at 11:07

-1

Check this topic: Removing duplicate elements from a List

You can convert the list in a set (which doesn't allow duplicates) and then back in a List if you really need this type of collection.

answered Sep 03 '18 at 11:07

hipposay

62
2

1

Not an answer. You need to answer the question rather than link to something. – nicomp Sep 03 '18 at 11:08
I already said in my post that it won't work because i have an list of String **Array**. The method HashSet use to delete duplicate is Object1.equals(Object2), which doesn't work with arrays. – Ablia Sep 03 '18 at 11:10

Remove duplicates from a list of String Array

6 Answers6