6

Consider that I need a n-sized vector where each element is defined between [-1,1]. The element a[i] is a float generated by -1 + 2*rand(). I need a elegant way to ensure that the sum of the elements of my array is equal to zero.

I've found two possible solutions:

The first one is this matlab function https://www.mathworks.com/matlabcentral/fileexchange/9700-random-vectors-with-fixed-sum. It has also a implementation in R, however it is too much work to implement it on C, since this function is used for a 2d array.

The second one is provided in this thread here: Generate random values with fixed sum in C++. Essentially, the idea is to generate n numbers with a normal distribution then normalize them to with my sum. (I have implemented it using python bellow) for a vector with sum up to 1.0. It works for every sum value except for zero.

import random as rd

mySum = 1;
randomVector = []
randomSum = 0

for i in range(7):
    randomNumber = -1 + 2*rd.random()
    randomVector.append(randomNumber)
    randomSum  += randomNumber

coef = mySum/randomSum
myNewList = [j * coef for j in randomVector]
newsum = sum(myNewList)

So, is there a way to do that using C or C++? If you know a already implemented function it would be awesome. Thanks.

Thales Carl
  • 81
  • 1
  • 7
  • 2
    Yes, you could do that in either language. However, please start with the [tour] and read [ask]. – Ulrich Eckhardt Oct 16 '19 at 20:01
  • 2
    First of all you need to realize that it is *not* random. And then you need to decide where you are "sacrificing" your randomness. For example you can generate n/2 random elements and then generate another n/2 elements while each one is the negation of the corresponding item from the random array. – Eugene Sh. Oct 16 '19 at 20:01
  • 1
    Generate a random vector, and subtract the average value from each element. – John Alexiou Oct 16 '19 at 20:02
  • @ja72 That could break the range requirement. – NathanOliver Oct 16 '19 at 20:03
  • 5
    Normalization along multiple instances. If you want a sum-to-zero set of random numbers between -1 and 1, then you sum the negative numbers and normalize them to be equal to -1 total, then you sum the positive numbers and normalize them to +1. it sacrifices true randomness but they still maintain a pseudo-random nature, and the then-total of the vector would be zero. From you code, you would need to ensure that there is at least 1 negative and 1 positive number - otherwise it would throw a div0 error. –  Oct 16 '19 at 20:06
  • 1
    Why don't you simply generate n-1 elements, calculate their sum and get the n-th element by changing the sign of the sum? – Roberto Caboni Oct 16 '19 at 20:09
  • 3
    @Cubo78 Can be out of range – Eugene Sh. Oct 16 '19 at 20:10
  • ...or even in range but absurd compared with the others. – Weather Vane Oct 16 '19 at 20:10
  • @Cryostasys it makes sense and probably it will fit to my needs. Thanks for the idea. – Thales Carl Oct 16 '19 at 20:12
  • 1
    @WeatherVane Why absurd? In any array satisfying the requirement any element equals `0-` – Eugene Sh. Oct 16 '19 at 20:12
  • @EugeneSh. at first I though you meant out of range of `float`. – Weather Vane Oct 16 '19 at 20:13
  • @Cubo78 I was thinking about doing something like that. However this could make the last number far different from the others inside the array. – Thales Carl Oct 16 '19 at 20:13
  • @ThalesCarl that is what I meant. – Weather Vane Oct 16 '19 at 20:14
  • Do you want the sum to be _exactly_ zero, or within some tolerance? – John Alexiou Oct 16 '19 at 20:20
  • @ja72 The ideal scenario would be the sum is exactly zero since I am adding a random term in one equation that must get canceled in the end. However, I believe I can survive if the sum is around this value. Why? – Thales Carl Oct 16 '19 at 20:26
  • Sorry, I actually forgot the range requirement. But I don't understand how a non uniform distribution would be against the requirements.. Anyway I have in mind some sort of algorithmic solution that imho would work, but I'm afk, and writing it on my phone would be a pain.. – Roberto Caboni Oct 16 '19 at 20:29
  • 3
    The space of possible answers when `n=3` is essentially the regular hexagon you get when slicing the cube with vertex coordinates at plus or minus one with the plane `x+y+z=0`. So how would you want the distribution on that hexagon to be? Proportional to area? Proportional to the volume of the original cube projected onto an area? – aschepler Oct 16 '19 at 22:49
  • @EugeneSh.: It is random. It is not uniformly distributed. – Eric Postpischil Oct 17 '19 at 00:01
  • @EricPostpischil It is not only non-uniform, it is quite special. How would you call the distribution where each element is linearly depending on the others? – Eugene Sh. Oct 17 '19 at 00:30
  • You need to clarify the question, particularly the probability distribution desired. aschelpler’s comment above is a good start. If you do not know what distribution you want, then explain why you want this. – Eric Postpischil Oct 17 '19 at 00:45
  • If the question doesn't mention a particular desired distribution , it means that's not any desired distribution. Random is random, and it is only limited by the capabiloty of generating a TRUE random number and by the requirements of the given range and the given sum. Stop. – Roberto Caboni Oct 17 '19 at 05:27
  • This would make an awesome [mathematics.se] question. Can you generate a flat random distribution with finite elements with sum zero and bounded? – John Alexiou Oct 17 '19 at 12:05

3 Answers3

2

I figured out a solution to your problem. This is not perfect since its randomness is limited by the range requirement.

The strategy is:

  1. Define a function able to generate a random float in a customizable range. No need to reinvent the wheel: I borrowed it from https://stackoverflow.com/a/44105089/11336762
  2. Malloc array (I omit pointer check in my example) and initialize the seed. In my example I just used current time but it can be improved
  3. For every element to be generated, pre-calculate random range. Given the i-th sum, make sure that the next sum is NEVER out of range: if the sum is positive, the range needs to be (-1,1-sum); if it is negative it the range needs to be (-1-sum,1)
  4. Do this until (n-1)th element. Last element must be directly assigned as the sum with the sign changed.
    #include<stdio.h>
    #include<stdlib.h>
    #include<time.h>

    float float_rand( float min, float max )
    {
        float scale = rand() / (float) RAND_MAX; /* [0, 1.0] */
        return min + scale * ( max - min );      /* [min, max] */
    }

    void main( int argc, char *argv[] )
    {
        if( argc == 2 )
        {
            int i, n = atoi ( argv[1] );
            float *outArr = malloc( n * sizeof( float ) );
            float sum = 0;

            printf( "Input value: %d\n\n", n );

            /* Initialize seed */
            srand ( time( NULL ) );

            for( i=0; i<n-1; i++ )
            {
                /* Limit random generation range in order to make sure the next sum is  *
                 * not outside (-1,1) range.                                            */
                float min = (sum<0? -1-sum : -1);
                float max = (sum>0? 1-sum : 1);

                outArr[i] = float_rand( min, max );
                sum += outArr[i];
            }

            /* Set last array element */
            outArr[n-1] = -sum;

            /* Print results */
            sum=0;
            for( i=0; i<n; i++ )
            {
                sum += outArr[i];
                printf( "  outArr[%d]=%f \t(sum=%f)\n", i, outArr[i], sum );
            }

            free( outArr );
        }  
        else
        {
          printf( "Only a parameter allowed (integer N)\n" );
        }
    }

I tried it, and it works also when n=1. In case of n=0 a sanity check should be added to my example.

Some output examples:

N=1:

Input value: 1

  outArr[0]=-0.000000   (sum=-0.000000)

N=4

Input value: 4

  outArr[0]=-0.804071   (sum=-0.804071)
  outArr[1]=0.810685    (sum=0.006614)
  outArr[2]=-0.353444   (sum=-0.346830)
  outArr[3]=0.346830    (sum=0.000000)

N=8:

Input value: 8

  outArr[0]=-0.791314   (sum=-0.791314)
  outArr[1]=0.800182    (sum=0.008867)
  outArr[2]=-0.571293   (sum=-0.562426)
  outArr[3]=0.293300    (sum=-0.269126)
  outArr[4]=-0.082886   (sum=-0.352012)
  outArr[5]=0.818639    (sum=0.466628)
  outArr[6]=-0.301473   (sum=0.165155)
  outArr[7]=-0.165155   (sum=0.000000)
Moshe Rabaev
  • 1,892
  • 16
  • 31
Roberto Caboni
  • 7,252
  • 10
  • 25
  • 39
  • Note the distribution of `outArr[i]` is different from the distribution of `outArr[j]`. – aschepler Oct 16 '19 at 22:45
  • 1
    It doesn't even cover the space. With `n==4`, you can't get results near `0.8, 0.8, -0.8, -0.8`. – aschepler Oct 16 '19 at 22:54
  • I'm aware of it. But I think it is mainly due to the seed, which in my tests made me keet obtaining a value not far from the negative limit (and this caused the unnatural negative-positive alternance). Improving the seed initialization would improve for sure outArr[i] vs outArr[j] distribution too. But.. hey! It meets the requirements! :) – Roberto Caboni Oct 16 '19 at 22:55
  • 1
    I am not a math nerd; but I feel uneasy because your adjustment of min,max is based on history, thus you have added *memory* into the random number sequence. Random is closed over linear operations, but is a memory-based operation actually linear? – mevets Oct 16 '19 at 23:17
2

Thank you guys again for the help.

So, based on the idea of Cryostasys I developed the following C code to solve my problem:

#include <stdio.h>      /* printf, scanf, puts, NULL */
#include <stdlib.h>     /* srand, rand */
#include <time.h>       /* time */
#include <math.h>

int main()
{
    int arraySize = 10; //input value 
    double createdArray[arraySize]; //output value

    double randomPositiveVector[arraySize];
    double randomNegativeVector[arraySize];
    double positiveSum = 0.;
    double negativeSum = 0.;

    srand(time(NULL)); //seed for random generation

    for(int i = 0; i < arraySize; ++i)
    {
        double randomNumber = -1.+2.*rand()/((double) RAND_MAX); //random in [-1.0,1.0]
        printf("%f\n",randomNumber);
        if(randomNumber >=0)
        {
            randomPositiveVector[i] = randomNumber;
            positiveSum += randomNumber;
        }
        else
        {
            randomNegativeVector[i] = randomNumber;
            negativeSum += randomNumber;
        }
    }
    if(positiveSum == 0. || negativeSum == 0.) printf("ERROR\n");

    double positiveCoefficient =  1.0/positiveSum;
    double negativeCoefficient = -1.0/negativeSum;
    for(int i = 0; i < arraySize; ++i)
    {
        randomPositiveVector[i] = positiveCoefficient * randomPositiveVector[i];
        randomNegativeVector[i] = negativeCoefficient * randomNegativeVector[i];
        if(fabs(randomPositiveVector[i]) > 1e-6) //near to zero 
        {
            createdArray[i] = randomPositiveVector[i];
        }
        else
        {
            createdArray[i] = randomNegativeVector[i];
        }
    }

    for(int i = 0; i < arraySize; ++i)
    {
        printf("createdArray[%d] = %9f\n",i,createdArray[i]);

    }

    return(0);
}

Please note that the randomness of the values generated is decreased, as mentioned in the comments of the question. Also, the kind of random distribution is determined by the function that you use to generate the randomNumber above. In this case, I've used rand() from stdlib.h which is based on giving a seed to the function and it is going to generate a pseudo-random number. You could use a different option, for instance, drand48() from stdlib.h as well.

Nevertheless, it is required that at least one positive and one negative value is generated in order to this code work. One verification step was added to the code, and if it reaches this condition one should run again the code or do something about.

Output example (arraySize = 10):

createdArray[0] = -0.013824
createdArray[1] =  0.359639
createdArray[2] = -0.005851
createdArray[3] =  0.126829
createdArray[4] = -0.334745
createdArray[5] = -0.473096
createdArray[6] = -0.172484
createdArray[7] =  0.249523
createdArray[8] =  0.262370
createdArray[9] =  0.001640
Thales Carl
  • 81
  • 1
  • 7
  • It seems that your solution generates values in a compressed range (near to 0). It's actually the opposite compared to my solution in which, I think due to random generator that always picks a first value near to the lower limit, oscillates near both limits. – Roberto Caboni Oct 22 '19 at 07:59
-1

One option is to generate some samples and then scale their values around the average. In C++ it would be something like the following

#include <iostream>
#include <iomanip>
#include <random>
#include <algorithm>
#include <cmath>

int main()
{
    std::random_device rd;
    std::seed_seq ss{rd(), rd(), rd(), rd()};
    std::mt19937 gen{ss};

    const int samples = 9;

    // Generates the samples in [0, 2]
    std::uniform_real_distribution dist(0.0, std::nextafter(2.0, 3.0));
    std::vector<double> nums(samples);
    double sum = 0.0;
    for ( auto & i : nums )
    {
        i = dist(gen);
        sum += i;
    }
    double average = sum / samples;
    double k = 1.0 / std::max(average, 2.0 - average);

    // Transform the values (apart from the last) to meet the requirements
    sum = 0.0;
    for ( size_t i = 0; i < nums.size() - 1; ++i )
    {
        nums[i] = (nums[i] - average) * k;
        sum += nums[i];
    };
    // This trick (to ensure the needed precision) only works if the sum
    // is always evaluated in the same order
    nums.back() = 0.0 - sum;

    sum = 0.0;
    for ( size_t i = 0; i < nums.size(); ++i )
    {
        sum += nums[i];
        std::cout << std::setw(10) << std::fixed << nums[i] << '\n';
    }
    if (sum != 0.0)
        std::cout << "Failed.\n";
}

Testable here.

Bob__
  • 12,361
  • 3
  • 28
  • 42