0

Here is the problem:

Given an array of integers, sort the array according to frequency of elements. For example, if the input array is {2, 3, 2, 4, 5, 12, 2, 3, 3, 3, 12}, then modify the array to {3, 3, 3, 3, 2, 2, 2, 12, 12, 4, 5}. if 2 numbers have same frequency then print the one which came 1st.

I know how to do it partially. Here is my approcach.

I will create a struct which will be like:

typedef struct node
{
  int index; // for storing the position of the number in the array.
  int count; // for storing the number of times the number appears
  int value; // for storing the actual value
} a[50];

I will create an array of these structs, I will then sort it by a sorting algorithm on the basis of their count. However, how can I ensure that if the frequency of two elements are same, then that number should appear which has a lesser index value?

  • 2
    You use two sort criteria. The criterion which should be checked first is the frequency or number of occurrences. The secondary criterion which should be checked when the frequencies are equal is the `index` number. – The Paramagnetic Croissant Jun 16 '14 at 20:11
  • You'll need to write your own sorting routine for this. Been a while since I did C -- in Java, I'd write a function that compares two instances and reports back which is larger, then I could use Collections.sort. You can do something similar in C. – KathyA. Jun 16 '14 at 20:11
  • Your question has been asked and answered. ***[here is one way](http://stackoverflow.com/a/20179103/645128)***. – ryyker Jun 16 '14 at 20:12
  • I couldn't really understand how exactly you're going to do it with this method, but if things are set on your mind, consider adding one another integer variable inside the structure, call it `initialindex` which is to stay constant unlike the `index` and keep the value of initial index. If there happens to be a tie, break the tie with respect to that. – Utkan Gezer Jun 16 '14 at 20:12
  • @ryyker That answer doesn't exactly resolve this problem. While it could, after some decent modifications, it still lacks the fix for the most important part, the part which 3745736 is asking for which he/she needs for breaking the tie. – Utkan Gezer Jun 16 '14 at 20:18
  • Also, why are storing the position of the number in the array in your struct? Doesn't your *array* give you that information? – KathyA. Jun 16 '14 at 20:18
  • @KathyA, I am storing it because, if numbers start repeating, then I will have the same index for all of them. – user3745736 Jun 16 '14 at 20:24
  • @ThoAppelsin, how should I initialise the value of the variable? – user3745736 Jun 16 '14 at 20:25
  • You should try writing a function that does at least part of what you want, then post it into your question. The struct definition doesn't quite count as showing what you've tried so far. – Austin Mullins Jun 16 '14 at 20:30
  • @user3745736 Same as the `index`. I assumed that you'd change the `index` as you sort the array, though of course that would be a redundant operation as KathyA. noted. If you were already going to use `index` as *initial* index, then there is no need for an extra variable; just use that, find out the lowest among the two runner-up sets, use that to break the tie. – Utkan Gezer Jun 16 '14 at 20:30
  • @user3745736 Ah, then it's not really the position in the array, since each position in the array can only hold one item. Instead, it's more of a rank, which I assume you'll use elsewhere in your program and isn't applicable to your question. :) – KathyA. Jun 16 '14 at 20:30
  • So, basically, after I sort on the basis of frequency, I will check for each value of frequency starting from max uptill 1, and all those elements of struct which match that value, I will sort them on the basis of their index. Can use mergesort. A long task this is. Anyways, thanks. :) – user3745736 Jun 16 '14 at 20:38

5 Answers5

1
#include <stdlib.h> // qsort, malloc, free
#include <stddef.h> // size_t
#include <stdio.h>  // printf

struct number
{
    const int * value;
    int         num_occurrences;
};

static void cmp_by_val(const struct number * a, const struct number * b)
{
    if (*a->value < *b->value)
        return -1;
    else if (*b->value < *a->value)
        return 1;
    else
        return 0;
}

static void cmp_by_occurrence_stable(const struct number * a, const struct number * b)
{
    if (a->num_occurrences < b->num_occurrences)
        return -1;
    else if (b->num_occurrences < a->num_occurrences)
        return 1;
    else if (a->value < b->value)
        return -1;
    else if (b->value < a->value)
        return 1;
    else
        return 0;
}

static struct number * sort_by_occurrence(const int * arr, size_t N)
{
    //
    // STEP 1: Convert the input
    //
    struct number * sort_arr = (struct number *)malloc(N * sizeof(struct number));
    if (! sort_arr) return NULL;
    for (int k = 0; k < N; ++k)
    {
        sort_arr[k].value = &arr[k];
        sort_arr[k].num_occurrences = 0;
    }
    //
    // STEP 2: Sort the input based on value
    //
    qsort(sort_arr, N, sizeof(struct number), cmp_by_val);
    //
    // STEP 3: Count occurrences
    //
    if (0 < N)
    {
        int cur_value = *sort_arr[0].value;
        int i = 0;
        for (j = 1; j < N; ++j)
        {
            if (*sort_arr[j].value != *sort_arr[i].value)
            {
                for (int k = i; k < j; ++k)
                    sort_arr[k].num_occurrences = j - i;
                i = j;
            }
        }
        for (; i < N; ++i)
            sort_arr[i].num_occurrences = N - i;
    }
    //
    // STEP 4: Sort based on occurrence count
    //
    qsort(sort_arr, N, sizeof(struct number), cmp_by_occurrence_stable);
    //
    // DONE
    //
    return sort_arr;
}

static void print_arr(const struct number * arr, size_t N)
{
    if (0 < N)
    {
        printf("%d", arr[0]->value);
        for (int k = 1; k < N; ++k)
            printf(", %d", arr[k]->value);
    }
    printf("\n");
}

int main(int argc, char ** argv)
{
    const int EXAMPLE_INPUT[11] = { 2, 3, 2, 4, 5, 12, 2, 3, 3, 3, 12 }; 
    struct number * sort_arr = sort_by_occurrence(EXAMPLE_INPUT, 11);
    if (sort_arr)
    {
        print_arr(sort_arr, 11);
        free(sort_arr);
    }
};
0xbe5077ed
  • 4,565
  • 6
  • 35
  • 77
  • qsort is not stable, how are you preserving the indexes when calling `qsort(sort_arr, N, sizeof(struct number), cmp_by_val);` ? – AK_ Jun 16 '14 at 21:02
  • Why 2 different functions? Sort by occurence and sort by value? – user3745736 Jun 16 '14 at 21:07
  • @AK_, notice the addresses of the values in the original array provides a way to deduce their original ordering. The `struct number` uses pointers to values, not values. Therefore the second sort produces an ordering that is equivalent to the ordering that would be determined by a stable sort. – 0xbe5077ed Jun 16 '14 at 21:20
  • @user3745736, the first sort is used to be able to count the frequencies of the elements (`num_occurrences`). The second sort then sorts by the frequencies. – 0xbe5077ed Jun 16 '14 at 21:21
  • @0xbe5077ed that was brilliant I must say! can you reduce lines by using `return a->value - b->value` ? – AK_ Jun 16 '14 at 21:31
  • @AK_ I think so (as long as pointer arithmetic always gives you a valid result in terms of `int` ... e.g. this wouldn't necessarily work on 64-bit Windows with MSVC). You could definitely reduce `cmp_by_val(...)` to a one-liner in any case! – 0xbe5077ed Jun 16 '14 at 21:38
  • @0xbe5077ed I see! Ok one last question, Within the same compare by val function, why are you de-referencing here: `*a->value < *b->value`? Shouldn't this be `a->value < b->value` aren't you suppose to compare address not the values of these addresses? – AK_ Jun 16 '14 at 21:43
  • @AK_: No. The first sort is intended to put things in value order so I can count the number of occurrences by using a simple `for` loop. It doesn't matter that I lose the relative order of the elements (supposing it matters, which it doesn't for bare `int`'s anyway) because I get it back in the second sort by breaking tries based on address. There is a bug in my code though: in the second sort I should break ties based on value first, address second. – 0xbe5077ed Jun 16 '14 at 21:56
0

It seems that the problem is using unstable sort algorithm on the frequency of array elements.

  1. Do a qsort on the array based on freq
  2. Again do a qsort on the resulted array based on the indexes of the element with the same freq only.

    • This should give you a correct answer in O(nLog)

I minimized the code. The obvious parts are left out.

struct node
{
    int *val;
    int freq;
    // int index; <- we can do this by comparing &a->val with &b->val
};

int compare_byfreq(const int* a, const int* b)
{
    return a->freq - b->freq;
}
int compare_index(const int* a, const int* b)
{
    if( a->freq == b->freq)
    {
        return  a->val - b->val; //this can never be zero
    }
    //else we have different freq don't move elem
    return 0;
}

int main()
{
    int arr[] = {2, 3, 2, 4, 5, 12, 2, 3, 3, 3, 12};
    node *narray = (struct node*)malloc(sizeof(arr) * sizeof(node));

    // build the nodes-array
    for(int i =0; i < sizeof(arr); i++)
    {
        /* buid narray here, make sure you store the pointer to val and not the actual values */
    }

    qsort(narray, sizeof(arr), compare_byfreq);
    qsort(narray, sizeof(arr), compare_index);

    /*print narray*/

    return 0;
}

Edit: @0xbe5077ed got an interesting idea. Instead of comparing indexes compare addresses of your values! - I just re-edited the code for that

AK_
  • 1,879
  • 4
  • 21
  • 30
0

You could create an array which stores the frequency of your input array (i.e. frequency[i] is the frequency of the input[i] element). After that it is easy to order the frequency array (using an stable algorithm) and make the same changes (swaps?) to the input array.

For creating the frequency array you can use several approaches, a simple and inefficient one is just counting each element with two nested loops. I left more efficient alternatives to your imagination.

Note: the frequency array has the same function as the count field in your struct node, but in a separated memory. If you will not need the frequencies in the future, I recommend you to use the separated memory, as you can release it.

Will
  • 891
  • 1
  • 7
  • 11
  • VERY expensive algorithm, you can tweak qsort to be stable look at @0xbe5077ed code.. – AK_ Jun 16 '14 at 21:45
  • As I said my approach for creating the frequency array is inefficient, but it can be improved. My intention was to give an understanding of the solution, not to give code. If you use qsort (plus counting) for creating the frequency array, and qsort again for frequency ordering, you will get the @0xbe5077ed solution. – Will Jun 16 '14 at 22:27
0

I was trying to learn Java nowadays, realized that this could be a good exercise. Tried and solved this problem over there in Eclipse. Java is horrible, I went back to C to solve it, here's a solution that I'll explain right after showing it:

#include <stdio.h>
#include <malloc.h>

typedef struct numbergroup {
    int firstencounteridx;
    int count;
    int thenumber;
} Numbergroup;

int firstoneissuperior( Numbergroup gr1, Numbergroup gr2 ) {
    return gr1.count > gr2.count ||   // don't mind the line-break, it's just to fit
    ( gr1.count == gr2.count && gr1.firstencounteridx < gr2.firstencounteridx );
}

void sortgroups( Numbergroup groups[], int amount ) {
    for ( int i = 1; i < amount; i++ ) {
        for ( int j = 0; j < amount - i; j++ ) {
            if ( firstoneissuperior( groups[j + 1], groups[j] ) ) {
                Numbergroup temp = groups[j + 1];
                groups[j + 1] = groups[j];
                groups[j] = temp;
            }
        }
    }
}

int main( ) {
    int input[] = { 2, 3, 2, 4, 5, 12, 2, 3, 3, 3, 12 };
    Numbergroup * groups = NULL;
    int amountofgroups = 0;

    for ( int i = 0; i < ( sizeof input / sizeof * input ); i++ ) {
        int uniqueencounter = 1;
        for ( int j = 0; j < amountofgroups; j++ ) {
            if ( groups[j].thenumber == input[i] ) {
                uniqueencounter = 0;
                groups[j].count++;
                break;
            }
        }
        if ( uniqueencounter ) {
            groups = realloc( groups, ( amountofgroups + 1 ) * sizeof * groups );
            groups[amountofgroups].firstencounteridx = i;
            groups[amountofgroups].count = 1;
            groups[amountofgroups].thenumber = input[i];
            amountofgroups++;
        }
    }

    sortgroups( groups, amountofgroups );

    for ( int i = 0; i < amountofgroups; i++ )
        for ( int j = 0; j < groups[i].count; j++ )
            printf( "%d ", groups[i].thenumber );

    free( groups );

    putchar( 10 );
    return 0;
}

Let me explain the structure first, as well as its functionality: It is for each unique number. In your example, it is for 2s, 3s, 4s, 5s and the 12s, one for each, 5 in total. Each one is to store:

  • the index of the first encounter of that number
  • the amount of encounter of that number
  • the value of that number

For example, for 12s, it shall store:

  • firstencounteridx as 5, that is the index of the first 12
  • count as 2
  • thenumber as 12

The first loop generally does that. It expands the group of Numbergroups whenever a unique number is encountered, stores its index as well; increases the count in case a number that already has a group has been encountered.

Then a sort is issued, which simply is a bubble sort. Might be different than the conventional one, I don't have any memorized.

Sorting criteria function simply checks if the count field of the first group is greater than the other; otherwise it checks whether they are the same and the firstencounter of the first group is earlier than the other; in which cases it returns 1 as true. Those are the only possible ways for the first group to be considered superior than the second one.

That's one method, there can be others. This is just a suggestion, I hope it helps you, not just for this case, but in general.

Utkan Gezer
  • 3,009
  • 2
  • 16
  • 29
0

Created a map and sort the map by value. O(nlogn) time, and O(n) space.

import java.util.*;

public class SortByFrequency {
    static void sortByFreq( int[] A ) {

        // 1. create map<number, its count>
        Map<Integer, Integer> map = new HashMap<>();

        for(int i = 0; i < A.length; i++) {
            int key = A[i];

            if( map.containsKey(key) ) {
                Integer count = map.get(key);
                count++;
                map.put(key, count);
            }
            else {
                map.put(key, 1);
            }
        }

        // 2. sort map by value in desc. order 
        // used modified (for desc. order) MapUtil in http://stackoverflow.com/questions/109383/how-to-sort-a-mapkey-value-on-the-values-in-java
        Map<Integer, Integer> map2= MapUtil.sortByValue(map);


        for(Map.Entry<Integer, Integer> entry : map2.entrySet() ) {
            int num = entry.getKey();
            int count = entry.getValue();

            for(int i = 0; i < count; i++ ) {
                System.out.print( num + " ");
            }
        }
        System.out.println();
    }

    public static void main(String[] args ) {
        int[] A1 = {2, 3, 2, 4, 5, 12, 2, 3, 3, 3, 12};
        sortByFreq(A1);
    }
}
Pang
  • 9,564
  • 146
  • 81
  • 122
user2761895
  • 1,431
  • 4
  • 22
  • 38