2

I want to find the most frequent and median values of a given array by using C++. I assume that I have a float array such as

float *LRArr=new LRArr[1000];

The array is filled by random float number.

std::default_random_engine generator;
generator.seed( rd() );
std::uniform_real_distribution<> distribution(0, 10);
for(int j=0;j<1000;j++)
{
    LRArr[j]=distribution(generator)
}

Now I want to get the most frequent values in the array. But it take many time. Could you suggest to me the faster way to implemt it by C or C++? I assume I have the LRArr such as

LRArr={0.1,1.2,6.5,6.5,4.5,6.5}
==>output is: 6.5 and median 5.5

This is my way:

float getMostFreq(float* LRArr;int sizeLRArr)
{
int count = 1;
int currentIndex = 0;
   for (int i = 1; i < sizeLRArr; i++)
   {
    if (LRArr[i] == LRArr[currentIndex])
        count++;
    else
        count--;
    if (count == 0)
    {
        currentIndex = i;
        count = 1;
    }
  }
  mostFreq = LRArr[currentIndex];
  return mostFreq;
} 
  • Won't work. There are just too many floats between 0 and 10, the chances of drawing 6.5000000 twice are too small. – MSalters Jun 14 '14 at 09:23
  • possible duplicate of [Compute Median of Values Stored In Vector - C++?](http://stackoverflow.com/questions/2114797/compute-median-of-values-stored-in-vector-c) – Kaslai Jun 14 '14 at 09:24
  • 1
    How about [Boost implementation of histogram](http://www.boost.org/doc/libs/1_47_0/boost/accumulators/statistics/weighted_p_square_cumulative_distribution.hpp)? – mlt Jun 14 '14 at 09:27
  • 2
    As MSalters said, there are too many possible `float`s, even if two did happen to overlap by chance, it wouldn't be a particularly interesting statistic. Depending on your actual end use-case, you may be better off creating something like a [Kernel Density Estimation](http://en.wikipedia.org/wiki/Kernel_density_estimation) or a [Histogram](http://en.wikipedia.org/wiki/Histogram) to find the area in which the values are most clustered. – Mankarse Jun 14 '14 at 09:28
  • Well if you use a sorted array your searching time would definitely decrease – 4aRk Kn1gh7 Jun 14 '14 at 09:37
  • All: It works. But my problem is it take long time tim done. And other problem, as said by Msalters, And I can resolve it by get median value. How to do it by my code – user3730173 Jun 14 '14 at 09:57
  • I don't think the algorithm you give works. For LRArr = { 01., 1.2, 6.5, 6.5, 4.5, 1.3, 1.8, 6.5} it no longer works. – George Jun 14 '14 at 10:06
  • It work . This is my code. But it does not have meadian function http://coliru.stacked-crooked.com/view?id=9d5d191e27cffd9c – user3730173 Jun 14 '14 at 10:17

1 Answers1

2

One way to count the frequency of float values in array is to compute a histogram and sort it. But you should take it in to account that the range of your values should be defined. This way the accuracy depends on the number of the histogram bins :

#include <algorithm>

#define histogramCount 10000
#define upperRange 1000
#define lowerRange 0

class histogram_data
{
public:
  int frequency;
  int index;
};

bool SortPredicate(const histogram_data& d1, const histogram_data& d2)
{
    return d1.frequency> d2.frequency;
}


void computeHistogram(float * array, int len)
{

   std::vector<histogram_data> histogram;

   for(int i=0;i<histogramCount;i++)
   {
       histogram_data hdata;
       hdata.frequency=0;
       hdata.index=i;
       histogram.push_back(hdata);
   }


   for(int i=0;i<len;i++)
   {
       histogram[(array[i]/(upperRange-lowerRange))*(histogramCount-1)].frequency++;
   }

   //sorting the histogram in descending order

    std::sort(histogram.begin(),histogram.end(),SortPredicate);

}

Now the frequencies of values are stored in the histogram in descending order. So the most frequent value could be acquired by :

float mostFrequent = ((float)histogram[0].index/(float)(histogramCount-1))*(upperRange-lowerRange);
Nejat
  • 31,784
  • 12
  • 106
  • 138
  • Thank you sir but it is error error: 'sort' is not a member of 'std' std::sort(histogram.begin(),histogram.end(),SortPredicate); – user3730173 Jun 14 '14 at 10:21
  • Add `#include `. – Nejat Jun 14 '14 at 10:27
  • How about the error : error: 'SortPredicate' was not declared in this scope std::sort(histogram.begin(),histogram.end(),SortPredicate); – user3730173 Jun 14 '14 at 10:44
  • I edited the answer. There was a typing error in definition of SortPredicate. – Nejat Jun 14 '14 at 10:52
  • Dear Nejat: I run it and it is ok. But sometime, I crashed at for(int i=0;i – user3730173 Jun 14 '14 at 12:57
  • @user3730173 You are right. I corrected the answer. I replaced histogramCount with (histogramCount-1). Sorry i have not tested this code. – Nejat Jun 14 '14 at 13:07
  • Yes, thank you. What does that mean of upperRange parameter? If assume that my array data is percent from 0 to 100%. So the upperRange is 100,right? Or 1000.Because large upperRange will take long time – user3730173 Jun 14 '14 at 13:14
  • 1
    Yeah. If your array data is percentage from 0 to 100% then your upperRange is 100. You can also set histogramCount to 1000 if you want to make it faster and do not need much precision. – Nejat Jun 14 '14 at 16:32