Filling an array with random numbers from 1 to 10^10 in C or C++

Question

a part of an assignment of mine is based on an array (its size is given by the user) which contains random numbers from 1 to 10^10. Then we have to find the k-th smaller number of the array. Here's what I tried:

#include <cstdlib>
#include <stdlib.h>
#include <stdio.h>
#include <iostream>
#include <time.h>

using namespace std;

void swap(int *x,int *y)
{
    int temp;
    temp = *x;
    *x = *y;
    *y = temp;
}

int choose_pivot(int i,int j )
{
    return((i+j) /2);
}

// Print array
void printarr(int arr[],int n)
{
    int i;
    for(i=0;i<n;i++)
        printf("%d\t",arr[i]);
}

// Find algorithm
int find1(int arr[],int left,int right,int k)
{
    int i,j,pivot;
    if (left==right)
        return arr[left];
    else
    {
        i=left;
        j=right+1;
        pivot= arr[left];
        do
        {
            do {
                i=i+1;
            } while (arr[i]>=pivot);
            do {
                j =j-1;
            } while (arr[j]<=pivot);
            if (i<j)
                swap(arr[i],arr[j]);
        } while (j<=i);
    }
    swap(arr[left],arr[j]);
    if (k==j)
        return arr[j];
    else if (k<j)
        find1(arr,left,j-1,k);
    else 
        find1(arr,j+1,right,k-j);
}

int main(int argc, char *argv[])
{
    srand(time(NULL));
    int n,i,fi,k;
    printf("Give array's size:\n");
    scanf("%d",&n);
    int pin[n];
    for (i=0;i<n;i++)
        pin[i]=((rand()*rand()) % 1000000000) +1;
    printf("Give k: \n");
    scanf("%d",&k);
    printf("The array contains the following numbers:\n\n");
    printarr(pin,n);
    fi=find1(pin,0,n-1,k);//find the k-th smallest number in the array
    printf("The k-th smallest number is: %d",fi);

    system("PAUSE");
}

As you can see 10^10 is a very big value, and I did something else to fill the array with the random numbers. Is it correct? Is there something else I could do? And my second problem is on the find algorithm. It doesn't work. Could anyone help me with these? Thank you very much

You really should use rand()*rand(). That doesn't have an even distribution. — Keith Irwin, Jan 15 '11 at 17:09
We really shouldn't be doing some kid's homework, especially when he's too lazy to try compiling it. — Tim, Jan 15 '11 at 17:26
@Tim Most former school colleagues of mine only compiled their code after they were done with writing it (most of them only saved then as well...). I think this is a common trait in people who just started studying programming. However this doesn't make him lazy. As you can see he gave it a shot and came here with some question regarding *his* code. Another thing is that, usually, if a person identifies his code as homework people who answer him only give him hints so nobody is doing his homework. — Alin Purcaru, Jan 15 '11 at 17:51
@Alin I am perfectly ok with helping someone who is stuck on a homework assignment - I just want to emphasize that we should not provide the answer. I think posting "Here's my compilation error, but I don't understand it" is perfectly valid. However, pr_prog84 simply posted his code and said "it doesn't work". That is a red flag for me. — Tim, Jan 15 '11 at 18:15

Ben Voigt · Answer 1 · 2011-01-15T18:28:00.237

3

long long get_big_rand()
{

    long long result;
    do {
        result = (rand() & 0x3ff);
        result <<= 12;
        result |= (rand() & 0xfff);
        result <<= 12;
        result |= (rand() & 0xfff);
    } while (++result > 10000000000ULL);
    return result;
}

edited Jan 15 '11 at 18:28

answered Jan 15 '11 at 17:25

Ben Voigt

277,958
43
419
720

+1: This is the best answer so far. Presumably you could generate more bits at a time if you were confident about the value of `RAND_MAX`. – Oliver Charlesworth Jan 15 '11 at 17:52
@Oli: exactly. 34 bits are needed, which can be 5 steps of 7 each, 4 steps of 9 each, or 3 steps of 12 each, since the most common real-world value of `RAND_MAX` is probably 2**15-1. Seems like the standard guarantees at least 2**15-1, I'll update my answer. – Ben Voigt Jan 15 '11 at 18:16

score 2 · Answer 2 · edited May 23 '17 at 12:15

2

rand()*rand() is a lot different than a single rand(), it decreases the randomness and changes its distribution. See this question for a deeper explanation.

Also, an integer usually is 4 bytes. It can contain a value as big as 2^31 (2 billions and something) or 2^32 (4 billions and more) if it's unsigned. You can see the max number it can contain checking the INT_MAX macro defined in limits.h. 10^10 is 10 billions, it won't fit in an integer, you'll have to use a bigger type (long long usually is 64 bytes thus more than you need).

rand, also, returns numbers up to RAND_MAX, and since it returns an int, it won't bigger than INT_MAX. You should use some other way to generate a number as big as 10^10.

If you don't care about randomness and random number distributions you could sum n random numbers (obtained by rand) so that n=10^10 / RAND_MAX.

edited May 23 '17 at 12:15

Community

1
1

answered Jan 15 '11 at 17:11

peoro

25,562
20
98
150

2

Bit shifting and OR-ing the pieces together would be a lot more efficient than adding random numbers together, and also yield a much more linear distribution. Adding them together gives a normal distribution. – Ben Voigt Jan 15 '11 at 17:22
Thank you Tim for your reply. I have been trying all day to find what was going wrong with find.. – pr_prog84 Jan 15 '11 at 17:30
@peoro "you could sum n random numbers" As long as you don't care for the distribution you would be better of multiplying by `10^10 / RAND_MAX`. @Ben *Adding them gives a distribution that approaches the normal distribution as n approaches infinity.* You have some idea of what's happening but it's not quite correct. @Oli Randomness does not change, only the distribution. There is a difference. – Alin Purcaru Jan 15 '11 at 17:59
3

@Alin: It's dangerous to assume my understanding is incorrect. Two points: (1) a comment is much too little space to explain the weaker and stronger central limit theorems, and (2) these are integers, so yes it's only an approximation of the normal distribution but not because we have finite additive terms, but rather because it's a discrete distribution and the normal is continuous. With `N = 10**10 / RAND_MAX`, which is several million, I feel quite safe claiming that the resulting distribution is as close to normal as it is possible to get with integers in this range. – Ben Voigt Jan 15 '11 at 18:11
@Alin: Sums of linear distributions are very good approximations of normal with N = several dozen. – Ben Voigt Jan 15 '11 at 18:13
@Ben I didn't take into account that `N = 10**10 / RAND_MAX` is a very large number. You're right. – Alin Purcaru Jan 15 '11 at 18:29
@Alin: And I'm simplifying and assuming `RAND_MAX` is 32767, which is by far the most common but not the only value in use. On systems which use `RAND_MAX` of 2**31-1, N would be small enough that the distribution would be only sort of shaped like normal. – Ben Voigt Jan 15 '11 at 18:45

score 2 · Answer 3 · edited May 23 '17 at 11:52

2

If you take a closer look at 10¹⁰ you will notice that it's quite a round limit. My take on this would be to generate each number, one digit at a time and ignoring insignificant zeroes. At this point you would have a number between 0 and 10¹⁰-1 inclusive. All you're left with doing is adding a 1.

As for random()*random(), that is the exact topic of this other question.

Alin

edited May 23 '17 at 11:52

Community

1
1

answered Jan 15 '11 at 17:18

Alin Purcaru

43,655
12
77
90

That would do it in 10 steps, using binary digits (7 bits at a time) it can be done in 5. – Ben Voigt Jan 15 '11 at 17:27
@Ben Voigt I'm not familiar with C or C++ so I can't express my opinion on your solution. All I can say is that what I posted is a general algorithmic approach that may be used in similar ways in most programming languages and it supports easily wider ranges (say 1 to 10^100). – Alin Purcaru Jan 15 '11 at 17:41

score 2 · Answer 4 · answered Jan 15 '11 at 17:18

Problem #1, an int will only hold a number of size 2^31 in size. You'll need a slightly bigger alternative for your pin array.

Also, multiplying your two random numbers together really doesn't do much - except perhaps make the number less random.

Next, you can't create an array on the stack dynamically with the user's input. That will require a new solution to make alloc an array for you.

score 1 · Answer 5 · answered Jan 15 '11 at 19:34

rand()*rand() isn't going to do anything for you. It doesn't scale the way you think it does, and it does change the distribuon. In fact

double norm_rand(){
    double r=0;
    for(unsigned i=0;i!=12;++i)
        r+=rand()/static_cast<double>(RAND_MAX);
    return (r/12)-6;
}

is a common way to simulate a normal distribution with mean 0 and variance 1;

The best way to to get large random numbers is using a random number device, like /dev/urandom or RtlGenRandom. i.e.

typedef unsigned long long big_type;
std::vector<double> rnums;
std::vector<big_type> buf(numtoread);
        std::ifstream rnds("/dev/urandom"); 
rnds.read(reinterpret_cast<char*>(&buf[0],buf.size()*sizeof(big_type));
std::transform(buf.begin(),buf.end(),std::back_inserter(rnums),
     [](big_type const& i){
        return (i*100000000000.)/(std::numeric_limits<big_type>::max());
     });

At the risk of doing your homework for you, an entirely different approach is to use the libraries that come with C++.

#include <cassert>
#include <sstream>
#ifndef _MSC_VER  //then assume Linux
#include <tr1/random>
#else
#include <random>
#endif
#include <boost/lexical_cast.hpp>
#include <algorithm>
#include <iterator>
#include <iostream>
int main(int argc, char** argv)
{
    assert(argc==3);
    unsigned const numentries=boost::lexical_cast<unsigned>(argv[1]);
    unsigned const k=boost::lexical_cast<unsigned>(argv[2]);
    std::cout<<" finding "<<k<<"th of "<< numentries<<" entries\n";
    assert(k<=numentries);
    std::vector<double> nums(numentries);
    std::tr1::uniform_real<> rng(0.,10000000000.);
    std::tr1::minstd_rand generator(42u);
    std::tr1::variate_generator<std::tr1::minstd_rand, std::tr1::uniform_real<> >
            uni(generator, rng);
    std::generate_n(nums.begin(),nums.size(),uni);
    std::cout<<" Generated:\t ";
    std::copy(nums.begin(),nums.end(),std::ostream_iterator<double>(std::cout,"\t"));
    std::sort(nums.begin(),nums.end());
    std::cout<<"\n The "<<k<<"th smallest entry is "<<nums[k]<<"\n";
    return 0;
}

(If you are in class at the level of just asking for making an array of rand numbers and you hand that in, they'll probably fail you) What I do in practice is to combine the two approaches. This is used in place of the linear conguentual rng used above (the minstd_rand):

template<typename bigtype=unsigned>
struct randeng {
    typedef bigtype result_type;
    randeng(unsigned x) :
        m_samplesrequired(x), m_samples(x), m_lastused() {
        std::ifstream rand;
        rand.open("/dev/urandom");
        assert(rand);
        rand.read(reinterpret_cast<char*> (&*(m_samples.begin())),
                m_samplesrequired * sizeof(unsigned));
    }
    result_type operator()() const {
        assert(m_lastused<m_samplesrequired);
        return m_samples[m_lastused++];
    }
    result_type max() const {
        return std::numeric_limits<result_type>::max();
    }
    result_type min() const {
        return 0;
    }
    unsigned m_samplesrequired;
    std::vector<result_type> m_samples;
    mutable unsigned m_lastused;
};

This always seems to give much better results.

score 0 · Answer 6 · answered Jan 15 '11 at 17:19

0

you forgot your return statements. at the end of find1 you should be doing:

if (k==j)
    return arr[j];
else if (k<j)
    return find1(arr,left,j-1,k);
else 
    return find1(arr,j+1,right,k-j);
}

answered Jan 15 '11 at 17:19

Tim

8,912
3
39
57

Filling an array with random numbers from 1 to 10^10 in C or C++

6 Answers6

Linked

Related