34

I have an unsorted array, what is the best method to remove all the duplicates of an element if present?

e.g:

a[1,5,2,6,8,9,1,1,10,3,2,4,1,3,11,3]

so after that operation the array should look like

 a[1,5,2,6,8,9,10,3,4,11]
cletus
  • 616,129
  • 168
  • 910
  • 942
mohit
  • 1,913
  • 3
  • 16
  • 23
  • 2
    Is this homework? If not, many languages (scripting languages at least) have this built in. Ruby: `[1, 2, 3, 2, 3, 1].uniq` – jtbandes Jul 28 '10 at 07:14
  • 1
    Use a temporary dictionary in which you insert elements as they are read, in order to remove them if they are already in the dictionary. – pascal Jul 28 '10 at 07:15
  • @jtbandes Its not a home work.. i wanted to know the appropriate algoritm basically. @pascal using temporary dictionary means using extra memory(storage)? – mohit Jul 28 '10 at 07:22
  • Yes, for example, see Matthew's answer. – pascal Jul 28 '10 at 08:26
  • 1
    Also If you are a C++ user..then use unique() in C++ STL http://www.cplusplus.com/reference/algorithm/unique/ – Abhinav Shrivastava Oct 14 '12 at 14:36
  • You can watch this link about same question http://stackoverflow.com/questions/24944844/removing-duplicates-inside-insertion-sort/26177449#26177449 – isxaker Oct 03 '14 at 10:41

14 Answers14

83

Check every element against every other element

The naive solution is to check every element against every other element. This is wasteful and yields an O(n2) solution, even if you only go "forward".

Sort then remove duplicates

A better solution is sort the array and then check each element to the one next to it to find duplicates. Choose an efficient sort and this is O(n log n).

The disadvantage with the sort-based solution is order is not maintained. An extra step can take care of this however. Put all entries (in the unique sorted array) into a hashtable, which has O(1) access. Then iterate over the original array. For each element, check if it is in the hash table. If it is, add it to the result and delete it from the hash table. You will end up with a resultant array that has the order of the original with each element being in the same position as its first occurrence.

Linear sorts of integers

If you're dealing with integers of some fixed range you can do even better by using a radix sort. If you assume the numbers are all in the range of 0 to 1,000,000 for example, you can allocate a bit vector of some 1,000,001. For each element in the original array, you set the corresponding bit based on its value (eg a value of 13 results in setting the 14th bit). Then traverse the original array, check if it is in the bit vector. If it is, add it to the result array and clear that bit from the bit vector. This is O(n) and trades space for time.

Hash table solution

Which leads us to the best solution of all: the sort is actually a distraction, though useful. Create a hashtable with O(1) access. Traverse the original list. If it is not in the hashtable already, add it to the result array and add it to the hash table. If it is in the hash table, ignore it.

This is by far the best solution. So why the rest? Because problems like this are about adapting knowledge you have (or should have) to problems and refining them based on the assumptions you make into a solution. Evolving a solution and understanding the thinking behind it is far more useful than regurgitating a solution.

Also, hash tables are not always available. Take an embedded system or something where space is VERY limited. You can implement an quick sort in a handful of opcodes, far fewer than any hash table could be.

cletus
  • 616,129
  • 168
  • 910
  • 942
  • 6
    In the question, the resulting array appears to preserve the order of the input array. – pascal Jul 28 '10 at 07:17
  • 1
    One should make it clear that hashtables only give you *expected* constant time, not guaranteed constant time. – rbrito Feb 22 '15 at 14:31
  • While hashtable does not let you to add duplicate items, if you add all of the numbers in hashtable, and then simply print it , you can reach at the same result. what is the point above to use some if conditions and makes the logic more complicated? – Gökhan Akduğan Feb 21 '16 at 02:20
  • @GökhanAkduğan The most apparent reasons that have been mentioned: significance of ordering, lack of hash table availability, strong space constraints. – Szymon Brych Sep 22 '17 at 06:23
  • Why remove from the hashtable if you found a duplicate? Duplicates can also mean triplets, quadruplets etc. I would not remove the value from the hashtable. – Rudy Velthuis Aug 20 '18 at 13:22
  • I don't know that the hash table is by far the best, even though it's theoretical complexity is. Sorting is very efficient and cache friendly. Just consider all the complexity in hash tables of handling collisions. – Justin Meiners Mar 22 '22 at 21:05
2

This can be done in amortized O(n) using a hashtable-based set.

Psuedo-code:

s := new HashSet
c := 0
for each el in a
  Add el to s.
    If el was not already in s, move (copy) el c positions left.
    If it was in s, increment c. 
Matthew Flaschen
  • 278,309
  • 50
  • 514
  • 539
2

If you don't need to keep the original object you can loop it and create a new array of unique values. In C# use a List to get access to the required functionality. It's not the most attractive or intelligent solution, but it works.

int[] numbers = new int[] {1,2,3,4,5,1,2,2,2,3,4,5,5,5,5,4,3,2,3,4,5};
List<int> unique = new List<int>();

foreach (int i in numbers)
     if (!unique.Contains(i))
          unique.Add(i);

unique.Sort();
numbers = unique.ToArray();
Jonathon Reinhart
  • 132,704
  • 33
  • 254
  • 328
William
  • 8,007
  • 5
  • 39
  • 43
1
    indexOutput = 1;
    outputArray[0] = arrayInt[0];
    int j;
    for (int i = 1; i < arrayInt.length; i++) {            
        j = 0;
        while ((outputArray[j] != arrayInt[i]) && j < indexOutput) {
            j++;
        }
        if(j == indexOutput){
           outputArray[indexOutput] = arrayInt[i];
           indexOutput++;
        }         
    }
dhayyati
  • 49
  • 9
1

Treat numbers as keys.

for each elem in array:
if hash(elem) == 1 //duplicate
  ignore it
  next
else
  hash(elem) = 1
  add this to resulting array 
end
If you know about the data like the range of numbers and if it is finite, then you can initialize that big array with ZERO's.
array flag[N] //N is the max number in the array
for each elem in input array:
  if flag[elem - 1] == 0
    flag[elem - 1] = 1
    add it to resulatant array
  else
    discard it //duplicate
  end
bhups
  • 14,345
  • 8
  • 49
  • 57
0

This is a code segment i created in C++, Try out it

#include <iostream>

using namespace std;

int main()
{
   cout << " Delete the duplicate" << endl; 

   int numberOfLoop = 10;
   int loopCount =0;
   int indexOfLargeNumber = 0;
   int largeValue = 0;
   int indexOutput = 1;

   //Array to hold the numbers
   int arrayInt[10] = {};
   int outputArray [10] = {};

   // Loop for reading the numbers from the user input
   while(loopCount < numberOfLoop){       
       cout << "Please enter one Integer number" << endl;
       cin  >> arrayInt[loopCount];
       loopCount = loopCount + 1;
   }



    outputArray[0] = arrayInt[0];
    int j;
    for (int i = 1; i < numberOfLoop; i++) {            
        j = 0;
        while ((outputArray[j] != arrayInt[i]) && j < indexOutput) {
            j++;
        }
        if(j == indexOutput){
           outputArray[indexOutput] = arrayInt[i];
           indexOutput++;
        }         
    }

   cout << "Printing the Non duplicate array"<< endl;

   //Reset the loop count
   loopCount =0;

   while(loopCount < numberOfLoop){ 
       if(outputArray[loopCount] != 0){
        cout <<  outputArray[loopCount] << endl;
    }     

       loopCount = loopCount + 1;
   }   
   return 0;
}
Vanji
  • 1,696
  • 14
  • 23
0

My solution(O(N)) does not use additional memory, but array must been sorted(my class using insertion sort algorithm, but it doesn't matter.):

  public class MyArray
        {
            //data arr
            private int[] _arr;
            //field length of my arr
            private int _leght;
            //counter of duplicate
            private int countOfDup = 0;
            //property length of my arr
            public int Length
            {
                get
                {
                    return _leght;
                }
            }

            //constructor
            public MyArray(int n)
            {
                _arr = new int[n];
                _leght = 0;
            }

            // put element into array
            public void Insert(int value)
            {
                _arr[_leght] = value;
                _leght++;
            }

            //Display array
            public void Display()
            {
                for (int i = 0; i < _leght; i++) Console.Out.Write(_arr[i] + " ");
            }

            //Insertion sort for sorting array
            public void InsertSort()
            {
                int t, j;
                for (int i = 1; i < _leght; i++)
                {
                    t = _arr[i];
                    for (j = i; j > 0; )
                    {
                        if (_arr[j - 1] >= t)
                        {
                            _arr[j] = _arr[j - 1];
                            j--;
                        }
                        else break;
                    }
                    _arr[j] = t;
                }
            }

            private void _markDuplicate()
            {
                //mark duplicate Int32.MinValue
                for (int i = 0; i < _leght - 1; i++)
                {
                    if (_arr[i] == _arr[i + 1])
                    {
                        countOfDup++;
                        _arr[i] = Int32.MinValue;
                    }
                }
            }

            //remove duplicates O(N) ~ O(2N) ~ O(N + N)
            public void RemoveDups()
            {
                _markDuplicate();
                if (countOfDup == 0) return; //no duplicate
                int temp = 0;

                for (int i = 0; i < _leght; i++)
                {
                    // if duplicate remember and continue
                    if (_arr[i] == Int32.MinValue) continue;
                    else //else need move 
                    {
                        if (temp != i) _arr[temp] = _arr[i];
                        temp++;
                    }
                }
                _leght -= countOfDup;
            }
        }

And Main

static void Main(string[] args)
{
     Random r = new Random(DateTime.Now.Millisecond);
     int i = 11;
     MyArray a = new MyArray(i);
     for (int j = 0; j < i; j++)
     {
        a.Insert(r.Next(i - 1));
     }

     a.Display();
     Console.Out.WriteLine();
     a.InsertSort();
     a.Display();
     Console.Out.WriteLine();
     a.RemoveDups();
     a.Display();

    Console.ReadKey();
}
isxaker
  • 8,446
  • 12
  • 60
  • 87
0
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Collection;
import java.util.HashMap;
import java.util.HashSet;
import java.util.List;
import java.util.Set;

public class testing {
    public static void main(String[] args) {
        EligibleOffer efg = new EligibleOffer();
        efg.setCode("1234");
        efg.setName("hey");
        EligibleOffer efg1 = new EligibleOffer();
        efg1.setCode("1234");
        efg1.setName("hey1");
        EligibleOffer efg2 = new EligibleOffer();
        efg2.setCode("1235");
        efg2.setName("hey");
        EligibleOffer efg3 = new EligibleOffer();
        efg3.setCode("1235");
        efg3.setName("hey");
        EligibleOffer[] eligibleOffer = { efg, efg1,efg2 ,efg3};
        removeDupliacte(eligibleOffer);
    }

    public static EligibleOffer[] removeDupliacte(EligibleOffer[] array) {
        List list = Arrays.asList(array);
        List list1 = new ArrayList();
        int len = list.size();
        for (int i = 0; i <= len-1; i++) {
            boolean isDupliacte = false;
            EligibleOffer eOfr = (EligibleOffer) list.get(i);
            String value = eOfr.getCode().concat(eOfr.getName());
            if (list1.isEmpty()) {
                list1.add(list.get(i));
                continue;
            }
            int len1 = list1.size();
            for (int j = 0; j <= len1-1; j++) {
                EligibleOffer eOfr1 = (EligibleOffer) list1.get(j);
                String value1 = eOfr1.getCode().concat(eOfr1.getName());
                if (value.equals(value1)) {
                    isDupliacte = true;
                    break;
                }
                System.out.println(value+"\t"+value1);
            }
            if (!isDupliacte) {
                list1.add(eOfr);
            }
        }
        System.out.println(list1);
        EligibleOffer[] eligibleOffer = new EligibleOffer[list1.size()];
        list1.toArray(eligibleOffer);
        return eligibleOffer;
    }
}
Nivi
  • 29
  • 2
0
Time O(n) space O(n) 

#include <iostream>
    #include<limits.h>
    using namespace std;
    void fun(int arr[],int size){

        int count=0;
        int has[100]={0};
        for(int i=0;i<size;i++){
            if(!has[arr[i]]){
               arr[count++]=arr[i];
               has[arr[i]]=1;
            }
        }
     for(int i=0;i<count;i++)
       cout<<arr[i]<<" ";
    }

    int main()
    {
        //cout << "Hello World!" << endl;
        int arr[]={4, 8, 4, 1, 1, 2, 9};
        int size=sizeof(arr)/sizeof(arr[0]);
        fun(arr,size);

        return 0;
    }
rjnitt
  • 1
0

Use a Set implementation.
HashSet,TreeSet or LinkedHashSet if its Java.

Zaki
  • 6,997
  • 6
  • 37
  • 53
0
public class RemoveDuplicateArray {
    public static void main(String[] args) {
        int arr[] = new int[] { 1, 2, 3, 4, 5, 6, 7, 2, 3, 4, 9 };
        int size = arr.length;
        for (int i = 0; i < size; i++) {
            for (int j = i+1; j < size; j++) {
                if (arr[i] == arr[j]) {
                    while (j < (size) - 1) {
                        arr[j] = arr[j + 1];
                        j++;
                    }
                    size--;
                }
            }
        }
        for (int i = 0; i < size; i++) {
            System.out.print(arr[i] + "  ");
        }
    }

}

output - 1 2 3 4 5 6 7 9

Onic Team
  • 1,620
  • 5
  • 26
  • 37
0

You can use the "in" and "not in" syntax in python which makes it pretty straight forward.

The complexity is higher than the hashing approach though since a "not in" is equivalent to a linear traversal to find out whether that entry exists or not.

li = map(int, raw_input().split(","))
a = []
for i in li:
    if i not in a:
        a.append(i)
print a
Deepak Pathania
  • 286
  • 3
  • 3
0

I am doing it in Python.

array1 = [1,2,2,3,3,3,4,5,6,4,4,5,5,5,5,10,10,8,7,7,9,10]

array1.sort() # sorting is must
print(array1)

current = NONE
count = 0 

# overwriting the numbers at the frontal part of the array
for item in array1:
    if item != current:
        array1[count] = item
        count +=1
        current=item
        
       

print(array1)#[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 5, 5, 5, 5, 6, 7, 7, 8, 9, 10, 10, 10]

print(array1[:count])#[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

The most Efficient method is :

array1 = [1,2,2,3,3,3,4,5,6,4,4,5,5,5,5,10,10,8,7,7,9,10]

array1.sort()
print(array1)

print([*dict.fromkeys(array1)])#[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

#OR#
aa = list(dict.fromkeys(array1))
print( aa)#[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
Soudipta Dutta
  • 1,353
  • 1
  • 12
  • 7
0

use an dictionary array and add each items as key if an item was duplicated , dictionary avoid to add it! it's the best solution

int[] numbers = new int[] {1,2,3,4,5,1,2,2,2,3,4,5,5,5,5,4,3,2,3,4,5};
IDictionary<int, string> newArray = new Dictionary<int, string>();

for (int i = 0; i < numbers.count() ; i++) 
{
   newArray .Add(numbers[i] , "");
}