1

I have a large vector, I want to add noise with normal distribution to it. what I am doing now is trivial for loop:

for (int i=0 ; i<size ; i++){    //size is of order 1000000
    boost::mt19937 gen;
    gen.seed(i);
    boost::normal_distribution<>  nd(0.0 , 1.0);
    boost::variate_generator<boost::mt19937&, boost::normal_distribution<> >
                                                         randNormal (gen,nd);

    noise = randNormal();  
    nsyData[i] = data[i] + sd*noise; 
    }

Is there an efficient way to do this? like what MATLAB does?

sali
  • 219
  • 4
  • 10
  • 1
    Topical question for Easter: What would matlab do? (Seriously, though, what _does_ Matlab do, you think?) – sehe Apr 19 '14 at 23:58
  • That's not easy to answer though! Doesn't MATLAB use vector computation based on hardware accelerated methods? Like vector addition distributed on several cores? – sali Apr 20 '14 at 01:31
  • I think that if you compile with the proper flags, the same might happen with your C++ code. In fact I see AVX instructions in the assembly output **[for GCC](http://coliru.stacked-crooked.com/a/51fff67fb7dc93a5)** as well as **[clang++](http://coliru.stacked-crooked.com/a/cc9e33f762cc9d96)** – sehe Apr 20 '14 at 01:51

1 Answers1

4

Here's my take on it:

#include <boost/random/normal_distribution.hpp>
#include <boost/random.hpp>

int main()
{
    boost::mt19937 gen(42); // seed it once
    boost::normal_distribution<double> nd(0.0, 1.0);
    boost::variate_generator<boost::mt19937&, boost::normal_distribution<double> > randNormal(gen, nd);

    std::vector<double> data(100000, 0.0), nsyData;
    nsyData.reserve(data.size());

    double sd = 415*randNormal();
    std::transform(data.begin(), data.end(), std::back_inserter(nsyData), 
            [sd,&randNormal](double data) { return data + sd*randNormal(); });

}

Note that you were seeding the mersenne twister on each iteration of the loop. I'm afraid this totally killed any quality guarantees of the generated random numbers. Seed your generator once. (Use a different seed, e.g. from a random_device if you need it to be non-deterministic, obviously).

See this Live On Coliru

Update After some back and forth in the comments, here's a c++03 version that should not actually be too bad while still being comprehensible:

#include <boost/random/normal_distribution.hpp>
#include <boost/random.hpp>
#include <boost/bind.hpp>

struct Xfrm { 
    typedef double result_type;

    template <typename Gen>
    double operator()(double sd, Gen& randNormal, double data) const {
        return data + sd*randNormal();
    }
};

int main()
{
    boost::mt19937 gen(42); // seed it once
    boost::normal_distribution<double> nd(0.0, 1.0);
    boost::variate_generator<boost::mt19937&, boost::normal_distribution<double> > randNormal(gen, nd);

    std::vector<double> data(100000, 0.0), nsyData;
    nsyData.reserve(data.size());

    double sd = 415*randNormal();

    std::transform(data.begin(), data.end(), std::back_inserter(nsyData), 
            boost::bind(Xfrm(), sd, boost::ref(randNormal), ::_1));
}
sehe
  • 374,641
  • 47
  • 450
  • 633
  • Thanks a lot for your reply. I first tried seeding once like this `boost::mt19937 gen (std::time(0));` but noise value were all the same. could you explain again why I need to seed once, and be sure that noise values are different? – sali Apr 20 '14 at 01:28
  • Here's a demo that shows this to be perfectly fine (for smaller vector size, with original data all 0, so you can see the precise noise values): **[Coliru](http://coliru.stacked-crooked.com/a/f433cbe618a1d4e4)**. – sehe Apr 20 '14 at 01:33
  • You are right, I should seed once **outside the for loop**. seeding inside the loop with same seed value makes noise values being all the same. Thanks again. – sali Apr 20 '14 at 01:36
  • 1
    About seeding: the PRNG generates random values. It would be strange if you had to, somehow, select the next random value each slot. See also [here](http://stackoverflow.com/a/686373/85371) – sehe Apr 20 '14 at 01:37
  • I am not very comfortable with this `std::transform(data.begin(), data.end(), std::back_inserter(nsyData),[sd,&randNormal](double data) {...}`. Could you please explain it a little bit more? – sali Apr 20 '14 at 01:43
  • you have done `double sd = 415*randNormal();` do we still need `randNormal()` in ` data + sd*randNormal();` ? – sali Apr 20 '14 at 01:55
  • 1
    [`std::transform`](http://en.cppreference.com/w/cpp/algorithm/transform) transforms an input range into an output iterator, by applying a transformation to each element. In this case the transformation is given by the _c++11 lambda_ `[](double data) {...}`. The output iterator is `std::back_inserter`, which just appends that value at the back of `nsyData` – sehe Apr 20 '14 at 01:57
  • 1
    @sali Of course you need randNormal() in the lambda, or else you would get the same deviation on each value. I just had to make up a value for `sd`, right? Remember your code didn't show it, and I don't post code that doesn't compile. Feel free to modify `sd` though :) – sehe Apr 20 '14 at 01:58
  • Damn, I am not using C++11 :( – sali Apr 20 '14 at 02:01
  • 1
    @sali no worries: **[here](http://coliru.stacked-crooked.com/a/67780ee30148ca5b)** the old school C++03 solution and here's more fancy version using **[Boost Phoenix to emulate the lambda](http://coliru.stacked-crooked.com/a/780caac1144841d3)** – sehe Apr 20 '14 at 02:15
  • Clean and perfect solutio! Thanks! If you could post the code as an answer, future references would find it very helpful (as the link might get broken.) – sali Apr 20 '14 at 02:15
  • @sali which one do you prefer (also, it will be increasingly unlikely that people still need a c++03 version :)) – sehe Apr 20 '14 at 02:19
  • You mean I am a stupid guy still sticking to c++03 :)). I think c++03 version can be easily understood, so people (including me!) can make their c++11 version based on that. – sali Apr 20 '14 at 02:22
  • @sali The c++03 versions both have tricky things going on. (Functors are passed by reference! Did you note `operator()` could not be made const there, even though stateful functors are discouraged pretty much throughout the whole standard library? Also, the cost of copying all those details will be dominant. The Phoenix version is /better/ but it's quite cryptic and I'm pretty sure few people would immediately see what it does, unless they happen to have used it before. – sehe Apr 20 '14 at 02:27
  • I'm much more comfortable assuming that programmers will know how to write a functor in pure c++03, than I am assuming that people will understand the intricacies of Boost Phoenix, or the trade-offs that are hidden in that functor-style version ([`Xfrm`](http://coliru.stacked-crooked.com/a/67780ee30148ca5b)). – sehe Apr 20 '14 at 02:28
  • Ok, That means I have to enable c++11 support and migrate! Thanks a million for the time you put into this. – sali Apr 20 '14 at 02:32
  • 1
    After all, I came up with **[another version](http://coliru.stacked-crooked.com/a/5e354af64051c7fd)** that doesn't compromise performance /that/ much, yet doesn't require horrific functor types with (eewww) pointers (puke!), by using `boost::bind` instead. I can live with putting this one in the answer :) – sehe Apr 20 '14 at 02:37
  • You do not have to **live with** putting something as answer, your answer helped me a lot, both to understand why to seed once and to make noise generation efficient, future references are more likely to be looking for c++11 solution, so do not worry about posting other answers. Your first answer, followed by great explanations are helpful enough. – sali Apr 20 '14 at 02:42
  • Oops. I meant "Functors are passed by _value_" earlier, where I said "The c++03 versions both have tricky things going on..." :( Sorry for the carelessness. The rest made sense, but that will have been confusing... – sehe Apr 20 '14 at 03:18