Removing Duplicates from an Array using C

Question

I want small clarification in array concept in C. I have array:

int a[11]={1,2,3,4,5,11,11,11,11,16,16};

I want result like this:

{1,2,3,4,5,11,16}

Means I want remove duplicates. How is it possible?

What is the question? Are you trying to remove duplicates? What have you tried? — Mysticial, Mar 08 '12 at 06:57
[What have you tried?](http://mattgemmell.com/2008/12/08/what-have-you-tried/) — Some programmer dude, Mar 08 '12 at 07:01
Check this out & then try to solve your problem/homkework (which would benefit you): http://stackoverflow.com/questions/1532819/algorithm-efficient-way-to-remove-duplicate-integers-from-an-array. — Yavar, Mar 08 '12 at 07:09

Jonathan Leffler · Answer 1 · 2015-09-03T14:07:16.577

9

You can't readily resize arrays in C - at least, not arrays as you've declared that one. Clearly, if the data is in sorted order, it is straight-forward to copy the data to the front of the allocated array and treat it as if it was of the correct smaller size (and it is a linear O(n) algorithm). If the data is not sorted, it gets messier; the trivial algorithm is quadratic, so maybe a sort (O(N lg N)) followed by the linear algorithm is best for that.

You can use dynamically allocated memory to manage arrays. That may be beyond where you've reached in your studies, though.

#include <assert.h>
#include <stdio.h>
#include <stdlib.h>

static int intcmp(const void *pa, const void *pb)
{
    int a = *(int *)pa;
    int b = *(int *)pb;
    if (a > b)
        return +1;
    else if (a < b)
        return -1;
    else
        return 0;
}

static int compact(int *array, int size)
{
    int i;
    int last = 0;
    assert(size >= 0);
    if (size <= 0)
        return size;
    for (i = 1; i < size; i++)
    {
        if (array[i] != array[last])
            array[++last] = array[i];
    }
    return(last + 1);
}

static void print(int *array, int size, const char *tag, const char *name)
{
   int i;
   printf("%s\n", tag);
   for (i = 0; i < size; i++)
       printf("%s[%d] = %d\n", name, i, array[i]);
}

int main(void)
{
   int a[11] = {1,2,3,4,5,11,11,11,11,16,16};
   int a_size = sizeof(a) / sizeof(a[0]);

   print(a, a_size, "Before", "a");
   a_size = compact(a, a_size);
   print(a, a_size, "After", "a");

   int b[11] = {11,1,11,3,16,2,5,11,4,11,16};
   int b_size = sizeof(b) / sizeof(b[0]);

   print(b, b_size, "Before", "b");
   qsort(b, b_size, sizeof(b[0]), intcmp);
   print(b, b_size, "Sorted", "b");
   b_size = compact(b, b_size);
   print(b, b_size, "After", "b");

   return 0;
}

edited Sep 03 '15 at 14:07

answered Mar 08 '12 at 07:47

Jonathan Leffler

730,956
141
904
1,278

like the compact function, it is, well, compact - better than my effort – bph Mar 08 '12 at 10:20
as an aside is the specification of static for the functions a 'best-practice' style thing, e.g. limiting visibility of the functions should be your 'default' unless you specifically need them to be more widely visible? – bph Mar 08 '12 at 10:24
1

In general terms, yes. I use static a lot in part because I compile with GCC options `-Wmissing-prototypes -Wstrict-prototypes` and that means I need a declaration of each non-static function before it is defined or used. I mark each function static until I know it will be needed outside the source file where it is written, and when it is needed outside, there's a header for the declaration to be added to. Sometimes I put the static declarations and the `main()` function at the top of a file; usually, I leave `main()` at the bottom of the file with the auxilliary functions above, like Pascal. – Jonathan Leffler Mar 08 '12 at 15:30
Just a quick fix: `static int compact(int *array, int size)` should check `if size == 0`, and then return 0. Right now it returns 1. – Thomas Walther Sep 03 '15 at 13:57
@ThomasWalther: Given that the `size` is a signed `int`, it should probably be `assert(size >= 0);` too, backed up with `if (size <= 0) return size;`. Developers get notified of their abuse of the code; end users don't see a crash because of misbehaviour by this code — but the calling code might do weird things because it wasn't careful enough in the first place. – Jonathan Leffler Sep 03 '15 at 14:01
@JonathanLeffler: `size == 0` should be supported; the algorithm should work fine on empty lists. **Edit**: Sorry, just saw your modification to the code, yes this is great. – Thomas Walther Sep 03 '15 at 14:10

chikuba · Answer 2 · 2012-03-08T23:35:21.723

#define arraysize(x)  (sizeof(x) / sizeof(x[0])) // put this before main

int main() {
    bool duplicate = false; 
    int a[11] = {1,2,3,4,5,11,11,11,11,16,16}; // doesnt have to be sorted
    int b[11];
    int index = 0;

    for(int i = 0; i < arraysize(a); i++) { // looping through the main array
        for(int j = 0; j < index; j++) { // looping through the target array where we know we have data. if we haven't found anything yet, this wont loop
            if(a[i] == b[j]) { // if the target array contains the object, no need to continue further. 
                duplicate = true;
                break; // break from this loop
            }
        }
        if(!duplicate) { // if our value wasn't found in 'b' we will add this non-dublicate at index
           b[index] = a[i];
           index++;
        }
        duplicate = false; // restart
    }
    // optional 
    int c[index]; // index will be the number of objects we have in b

    for(int k = 0; k < index; k++) {
        c[k] = b[k];
    }
}

If you really have to you can create a new array where that is the correct size and copy this into it.

As you can see, C is a very basic (but powerful) language and if you can, use a vector to but your objects in instead (c++'s std::vector perhaps) which can easily increase with your needs.

But as long as you only use small numbers of integers you shouldn't loose to much. If you have big numbers of data, you can always allocate the array on the heap with "malloc()" and pick a smaller size (maybe half the size of the original source array) that you then can increase (using realloc()) as you add more objects to it. There is some downsides reallocating the memory all the time as well but it is a decision you have to make - fast but allocation more data then you need? or slower and having the exact number of elements you need allocated (which you really cant control since malloc() might allocate more data then you need in some cases).

score 1 · Answer 3 · answered Mar 31 '12 at 17:28

//gcc -Wall q2.cc -o q2 && q2                                                                             

//Write a program to remove duplicates from a sorted array.                                               

/*                                                                                                        
  The basic idea of our algorithm is to compare 2 adjacent values and determine if they                   
are the same.  If they are not the same and we weren't already looking previusly at adjacent pairs        
that were the same, then we output the value at the current index.  The algorithm does everything         
in-place and doesn't allocate any new memory.  It outputs the unique values into the input array.         
 */                                                                                                       

#include <stdio.h>                                                                                        
#include <assert.h>                                                                                       

int remove_dups(int *arr, int n)                                                                          
{                                                                                                         
        int idx = 0, odx = -1;                                                                            
        bool dup = false;                                                                                 
        while (idx < n)                                                                                   
        {                                                                                                 
                if (arr[idx] != arr[idx+1])                                                               
                {                                                                                         
                        if (dup)                                                                          
                                dup = false;                                                              
                        else                                                                              
                        {                                                                                 
                                arr[++odx] = arr[idx];                                                    
                        }                                                                                 
                } else                                                                                    
                        dup = true;                                                                       

                idx++;                                                                                    
        }                                                                                                 

        return (odx == -1) ? -1 : ++odx;                                                                  
}                                                                                                         

int main(int argc, char *argv[])                                                                          
{                                                                                                         
        int a[] = {31,44,44,67,67,99,99,100,101};                                                         
        int k = remove_dups(a,9);                                                                         
        assert(k == 3);                                                                                   
        for (int i = 0;i<k;i++)                                                                           
                printf("%d ",a[i]);                                                                       

        printf("\n\n");                                                                                   
        int b[] = {-5,-3,-2,-2,-2,-2,1,3,5,5,18,18};                                                      
        k = remove_dups(b,12);                                                                            
        assert(k == 4);                                                                                   
        for (int i = 0;i<k;i++)                                                                           
                printf("%d ",b[i]);                                                                       

        printf("\n\n");                                                                                   
        int c[] = {1,2,3,4,5,6,7,8,9};                                                                    
        k = remove_dups(c,9);                                                                             
        assert(k == 9);                                                                                   
        for (int i = 0;i<k;i++)                                                                           
                printf("%d ",c[i]);                                                                       

        return 0;                                                                                         
}

Mathesh Waran · Answer 4 · 2014-11-22T14:40:21.427

Store the array element with small condition into new array **just run once 100% will work !)store the first value into array

II)store the another element check with before stored value..

III)if it exists leave the element--and check next one and store

here the below code run this u will understand better

int main()
{

    int a[10],b[10],i,n,j=0,pos=0;
    printf("\n enter a n value ");
    scanf("%d",&n);
    printf("\n enter a array value");
    for(i=0;i<n;i++)
    {
        scanf("%d",&a[i]);//gets the arry value
    }

   for(i=0;i<n;i++)

   {
    if(check(a[i],pos,b)==0)//checks array each value its exits or not
    {
        b[j]=a[i];
        j++;
        pos++;//count the size of new storing element
    }
   }
    printf("\n after updating array");

    for(j=0;j<pos;j++)
    {
        printf("\n %d",b[j]);
    }   return 0;

    }
   int check(int x,int pos,int b[])
{    int m=0,i;
    for(i=0;i<pos;i++)//checking the already only stored element
    {
        if(b[i]==x)
        {
           m++; //already exists increment the m value
        }
    }
    return m;
}

score 0 · Answer 5 · answered Mar 08 '12 at 07:00

0

you should create a new array and you should check the array if contains the element you want to insert before insert new element to it.

answered Mar 08 '12 at 07:00

Leo

63
1
3

score 0 · Answer 6 · answered Mar 08 '12 at 07:03

0

The question is not clear. Though, if you are trying to remove duplicates, you can use nested 'for' loops and remove all those values which occur more than once.

answered Mar 08 '12 at 07:03

krtkush

1,378
4
23
46

Check out my answer. Might be what @K_K meant – chikuba Mar 08 '12 at 09:34

score 0 · Answer 7 · answered Mar 08 '12 at 07:09

0

C does not have a built in data type that supports what you want -- you would need to create your own.

answered Mar 08 '12 at 07:09

Chris Marsh

41
3

score 0 · Answer 8 · answered Mar 08 '12 at 09:36

int a[11]={1,2,3,4,5,11,11,11,11,16,16};

As this array is sorted array, you can achieve very easily by following code.

int LengthofArray = 11;

//First elemnt can not be a duplicate so exclude the same and start from i = 1 than 0.

for(int i = 1; i < LengthofArray; i++);
{
   if(a[i] == a[i-1])
      RemoveArrayElementatIndex(i);
}

//function is used to remove the elements in the same as index passed to remove.

RemoveArrayElementatIndex(int i)
{
   int k  = 0;

   if(i <=0)
   return;


   k = i;
   int j =1; // variable is used to next item(offset) in the array from k.

   //Move the next items to the array    
   //if its last item then the length of the array is updated directly, eg. incase i = 10.

   while((k+j) < LengthofArray)
   { 
     if(a[k] == a[k+j])
     {
         //increment only j , as another duplicate in this array
         j = j +1 ;
     }
     else
     {
         a[k] = a[k+j];
         //increment only k , as offset remains same
         k = k + 1;   
     }

   }   

   //set the new length of the array . 
   LengthofArray = k; 

}

bph · Answer 9 · 2012-03-08T10:05:28.987

You could utilise qsort from stdlib.h to ensure your array is sorted into ascending order to remove the need for a nested loop.

Note that qsort requires a pointer to a function (int_cmp in this instance), i've included it below.

This function, int_array_unique returns the duplicate free array 'in-place' i.e. it overwrites the original and returns the length of the duplicate free array via the pn pointer

/**
 * Return unique version of int array (duplicates removed)
 */
int int_array_unique(int *array, size_t *pn)
{
    size_t n = *pn;

    /* return err code 1 if a zero length array is passed in */
    if (n == 0) return 1;

    int i;
    /* count the no. of unique array values */
    int c=0;

    /* sort input array so any duplicate values will be positioned next to each
     * other */
    qsort(array, n, sizeof(int), int_cmp);

    /* size of the unique array is unknown at this point, but the output array
     * can be no larger than the input array. Note, the correct length of the
     * data is returned via pn */
    int *tmp_array = calloc(n, sizeof(int));

    tmp_array[c] = array[0];
    c++;

    for (i=1; i<n; i++) {
        /* true if consecutive values are not equal */
        if ( array[i] != array[i-1]) {
            tmp_array[c] = array[i];
            c++;
        }
    }

    memmove(array, tmp_array, n*sizeof(int));

    free(tmp_array);

    /* set return parameter to length of data (e.g. no. of valid integers not
     * actual allocated array length) of the uniqe array */
    *pn = c;

    return 0;
}

/* qsort int comparison function */
int int_cmp(const void *a, const void *b)
{
    const int *ia = (const int *)a; // casting pointer types
    const int *ib = (const int *)b;

    /* integer comparison: returns negative if b > a
    and positive if a > b */
    return *ia  - *ib;
}

actually just use Jonathan's answer - its better.. – bph Mar 08 '12 at 10:21 — bph, Mar 08 '12 at 10:21

Removing Duplicates from an Array using C

9 Answers9

Linked

Related