174

How to remove duplicates from an Array<String?> in kotlin?

jturolla
  • 6,596
  • 7
  • 26
  • 41
  • If someone is looking for consecutive characters to remove then visit https://handyopinion.com/method-to-remove-consecutive-characters-from-a-string-in-kotlin-android/ – Asad Ali Choudhry Apr 20 '20 at 22:08

2 Answers2

340

Use the distinct extension function:

val a = arrayOf("a", "a", "b", "c", "c")
val b = a.distinct() // ["a", "b", "c"]

There's also distinctBy function that allows one to specify how to distinguish the items:

val a = listOf("a", "b", "ab", "ba", "abc")
val b = a.distinctBy { it.length } // ["a", "ab", "abc"]

As @mfulton26 suggested, you can also use toSet, toMutableSet and, if you don't need the original ordering to be preserved, toHashSet. These functions produce a Set instead of a List and should be a little bit more efficient than distinct.


You may find useful:

Community
  • 1
  • 1
hotkey
  • 140,743
  • 39
  • 371
  • 326
  • 9
    You can also use `toSet` or `toMutableSet` which have less overhead than `distinct` and if ordering does not matter you can use `toHashSet`. – mfulton26 Nov 07 '16 at 14:40
  • @mfulton26, certainly it doesn't always have overhead. E.g. a JPA entity object can have lazy loaded fields, so it's more efficient to distinct its collection by id than perform full comparison – Buckstabue Jan 06 '19 at 20:03
  • 1
    @Buckstabue if you only need a Collection back (and it doesn't matter if it is a List or a Set) then using a Collection optimized for unique elements will be more efficient. The current implementation of distinct uses toMutableSet() in its implementation and then converts it to a List so by using toSet et. al. directly you avoid the extra intermediary Collection instance ([kotlin/_Arrays.kt:9145-9155 at master · JetBrains/kotlin](https://github.com/JetBrains/kotlin/blob/90c787e1023554bd0bc435071a78b9986f001068/libraries/stdlib/common/src/generated/_Arrays.kt#L9148-L9155)). – mfulton26 Jan 07 '19 at 17:37
  • @mfulton26, please re-read my comment. I said "it is NOT ALWAYS efficient" and gave you an example where .distinctBy{ } can be useful. Sometimes calling equals() and hashCode() is a very expensive thing – Buckstabue Jan 08 '19 at 22:27
  • 3
    @Buckstabue I see, I believe we're talking about two different issues: 1) `to*Set` is more efficient (space & time) than `distinct[By]` because it returns the `Set` directly instead of using a `Set` internally and converting it to a `List` as its return value and 2) `distinctBy` is can be more efficient than `distinct` simply because you can avoid full object equality comparison. Both are valid points. I ran with your statement that "certainly it doesn't always have overhead" and I was replying to that and overlooked that you were comparing `distinct` with `distinctBy` (and not with `to*Set`). – mfulton26 Jan 09 '19 at 13:49
  • 1
    @mfulton26, you are correct. I mostly meant that sometimes it's better to use List + distinctBy than Set, because Set intensively use equals/hashCode which potentially might be expensive to call – Buckstabue Jan 10 '19 at 08:39
  • 4
    At time of writing, `Iterable.distinct` actually does `toMutableSet().toList()` internally. So don't worry about performance :-) – Luke Needham Mar 30 '20 at 14:30
  • @Luke, using `.toHashSet()` should be a little faster than `.toMutableSet()` or `.toSet()`, as those create a linked hash set, which has additional links between the elements, consuming a little more time to construct than an ordinary hash set. And also, `.toHashSet()` or `.toSet()` don't spend time converting the result back to a list. But except for hot, performance-critical code, the difference should be negligible. – hotkey Mar 30 '20 at 14:56
  • Please note that `distinct` or `toSet` won't work on a list of arrays. I.e List of ByteArray or List of CharArray – aldok Oct 21 '20 at 02:34
  • @aldok That's because `distinct` and `toSet` rely on `equals` for comparison and arrays don't implement `equals` (they just inherit it form `Object` which uses `==` for equality). I.e. arrays are only equal if they refer to the same array object. You could use `distinctBy { it.toList() }` as a quick solution, but that converts the arrays to lists, then the list of lists to a set, which compares the lists by contents and removes duplicates, and then converts the set back to a list, so it's probably not the most effective way of doing that. – Dario Seidl Jul 27 '21 at 12:51
0

Algorithm

If you need to remove duplicates in-place use the following extension function:

fun <T : Comparable<T>> Array<T?>.distinctInPlace(): Int {
    this.sortBy { it }
    var placed = 1
    var removed = 0
    var i = 1
    while (i < size) {
        if (this[i] == this[i - 1])
            removed++
        else {
            this[placed] = this[i]
            placed++
        }
        i++
    }
    for (iter in size - removed..lastIndex)
        this[iter] = null
    return size - removed
}

This method will return the amount of unique elements in O(n log(n)) time. All of them will be sorted. Last for loop is used to set all other elements to null.

Note: if you had a null element in the array, it will be placed at the 0 index - so you can distinguish whether you had any nulls or they were added after.

Examples

fun main() {
    val arr = arrayOf("a", null, "b", null, "c", "ab", "ab")
    arr.distinctInPlace() // returns 5, arr is now [null, "a", "ab", "b", "c", null, null]

    val withoutNulls = arrayOf("a", "a", "aa", "aaa", "aa")
    withoutNulls.distinctInPlace() // returns 3, arr is now ["a", "aa", "aaa"]
}
llesha
  • 423
  • 1
  • 15