0

To randomly generate nbValue in the interval [average-delta,average+delta] such that the sum of these values is equal to fullSum, I use this program
1 : nbValue-1 are randomly generated
2 : the last value is calculated (fullSum - Sum of nbValue)
3 : If the last value is not in the interval, this loop is restarted.

This is the code :

int main()
{

    int fullSum = 4096, nbValue = 16;
    int average = round(fullSum / nbValue);
    int delta = round(average*0.125);
    int tryCounter = 0;
    int lastStep;

    srand(time(0));

    do {
      int currentSum = 0;
      for (int i=0;i<nbValue-1;i++)
        currentSum+=rand()%(delta*2+1)-delta+average ;
      tryCounter ++;
      lastStep = fullSum - currentSum;
    }
    while ( (lastStep < average - delta) || (lastStep > average + delta) );

    printf("Total tries : %d\n",tryCounter);
} 

Is there a better solution to be more efficient (That means, to reduce the number of tries) ? (Rq : I want a uniform distribution on the intervall)

Thanks for answer.

Stef1611
  • 1,978
  • 2
  • 11
  • 30
  • 1
    Depends what your distribution requirement is. Your algorithm won't give you a "fair" uniform distribution for every element. – Eugene Sh. Feb 07 '22 at 17:58
  • @EugeneSh. Good comment, I am also afraid of that but I am not a specialist in statistic and distribution. I repeated 1000 times this program, I binned the last values and I plotted an histogram. It did not look so bad but ... If I may, I am interested by a detailled comment of your remark. – Stef1611 Feb 07 '22 at 18:12
  • 1
    How about "cheating"? I.e. determine the average, then generate random pairs of (average+random, average-random), using the same random twice. – Yunnosch Feb 07 '22 at 18:31
  • 2
    Well, think about a "pathological" case where you have `n` element array, and each element can take values `0` or `1`. And the sum required is `1`. In such a case the first element has equal chance to be `0` or `1`. But the last element can be `1` only in case the other `n-1` elements are `0`, which is `1/2^(n-1)`. The less artificial cases are harder to calculate. – Eugene Sh. Feb 07 '22 at 18:34
  • 1
    Note that your `fullSum / nbValue` is an *integer* (truncating) division, not an FP one. Moreover, the result is already an integer, so the call to `round()` is superfluous in that case. – John Bollinger Feb 07 '22 at 18:37
  • @Stef1611 What values can `average`, `delta`, `x` have? `[INT_MIN ... INT_MAX]`? What if `RAND_MAX < INT_MAX`? – chux - Reinstate Monica Feb 07 '22 at 18:40
  • This doesn't reduce the number of tries, but if producing an array of values, it may be a good idea to exchange the values at indices `nbValue-1` and `rand()%nbValue` in the array, or possibly shuffle the entire array. – Ian Abbott Feb 07 '22 at 18:44
  • @JohnBollinger. This is just an example with power of two. I am aware that it is not necessary in this case. Thanks for comment. – Stef1611 Feb 07 '22 at 20:00
  • 1
    Does this answer your question? [Is there an efficient way to generate N random integers in a range that have a given sum or average?](https://stackoverflow.com/questions/61393463/is-there-an-efficient-way-to-generate-n-random-integers-in-a-range-that-have-a-g) – Peter O. Feb 07 '22 at 20:16
  • 1
    When you say generating numbers at random you mean with a flat probability distribution, a Gaus distribution, a Poison distribution? this is important, because the easy way it to generate `n-1` numbers at random and be the last the difference to the desired number. But the probability distribution will not be the correct one, then. And you will have probably a far number to the average. – Luis Colorado Feb 08 '22 at 16:58
  • 1
    @LuisColorado, generating n-1 numbers and then using the difference as the last number is exactly what the OP describes, with the important caveat that they reject the result and retry if the difference does not fall within the required range. And that reject / retry makes a big difference when it comes to the distribution of accepted results, so distribution is indeed an important consideration to raise. I don't think you can get a uniform distribution of even the first `nbValues - 1` numbers across the interval that way. – John Bollinger Feb 08 '22 at 20:43
  • 1
    that will give you a bad distribution... it is better to generate n numbers and then scale and offset all of them to fit the required average and standard deviation. The distribution will be the same as the original and the average and standard deviation will be as required. – Luis Colorado Feb 09 '22 at 02:08
  • @LuisColorado. Thanks a lot. I will use this solution. – Stef1611 Feb 09 '22 at 07:38
  • @LuisColorado. I posted an answer. Do not hesitate to modify it or to change the owner because it is your idea. – Stef1611 Feb 09 '22 at 08:29
  • @Stef1611, don't worry, you deserve the credit of having posted this as an answer! :) – Luis Colorado Feb 10 '22 at 15:43

3 Answers3

1

One possible improvement (depends of cost of check vs. cost of rand()):

Rather than iterate nbValue-1 times and then check if a possible solution exist, also check after each iteration.

  // Re-start here
  sum = 0; 
  for (int i=0; i < nbValue-1; i++) {
    currentSum += rand()%(delta*2+1)-delta+average ;       
    int diff = (fullSum - currentSum)/(nbValue - i - 1);
    if (diff < average - delta) || (diff > average + delta) {
      start for loop again
    }
  }  
  ...
chux - Reinstate Monica
  • 143,097
  • 13
  • 135
  • 256
1

I came up with a tail-recursive solution that "almost" works :-)

#include <stdio.h>
#include <stdlib.h>
#include <time.h>

int rand_between(int lo, int hi) {
    // simple implementation may suffer from bias
    return rand() % (hi - lo + 1) + lo;
}

void rand_with_const_sum(int *a, int n, int sum, int min, int max) {
    if (n-- == 0) return;
    int tmpmax = sum - n*min;
    int tmpmin = sum - n*max;
    if (tmpmax > max) tmpmax = max;
    if (tmpmin < min) tmpmin = min;
    a[0] = rand_between(tmpmin, tmpmax);
    rand_with_const_sum(a + 1, n, sum - a[0], min, max);
}

int main(void) {
    srand((unsigned)time(0));
    int a[100];

    rand_with_const_sum(a, 10, 100, 7, 13);
    for (int i = 0; i < 10; i++) printf("%d ", a[i]);
    puts("");

    puts("");

    rand_with_const_sum(a, 100, 1000, 7, 13);
    // shuffle(a, 100); // ?
    for (int i = 0; i < 100; i++) printf("%d ", a[i]);
    puts("");

    return 0;
}
pmg
  • 106,608
  • 13
  • 126
  • 198
0

Following the Luis Colorado comment, I post this answer to my question : int main() {

int fullSum = 4096, nbValue = 16;
int average = round(fullSum / nbValue);
int delta = round(average*0.125);
int rdValues[nbValue];

srand(time(0));

int currentSum = 0;
for (int i=0;i<nbValue;i++) {
    rdValues[i] = rand()%(delta*2+1)-delta+average ;
    currentSum+=rdValues[i];
}

for (int i=0;i<nbValue;i++)
    rdValues[i]=round(rdValues[i]/currentSum*fullSum) ;

// To slightly adjust the values because they are integers
currentSum = 0;
for (int i=0;i<nbValue;i++)
    currentSum+=rdValues[i];

int deltaSum=fullSum-currentSum;
int nextInc,rdIndex;
for (int i=0; i<abs(deltaSum);i++) {
    nextInc=0;
    while (nextInc=0) {
        rdIndex = rand()%(nbValue) ;
        if ( deltaSum<0 && rdValues[rdIndex] > (average-delta) ) {
            nextInc=1;
            rdValues[rdIndex]--; }
        if ( deltaSum>0 && rdValues[rdIndex] < (average+delta) ) {
            nextInc=1;
            rdValues[rdIndex]++; }
    }
}

}
Stef1611
  • 1,978
  • 2
  • 11
  • 30