5

I have the function mutateSequence that takes in three parameters. The parameter p is a value between 0 and 1, inclusive. I need two if statements, one that is entered with probability 4p/5 and another that is entered with probability p/5. How do I write the logic to make this happen?

Code:

void mutateSequence(vector<pair<string, string>> v, int k, double p)
{
       for (int i = 0; i < k - 1; i++)
    {
        string subjectSequence = v[i].second;
        for (int j = 0; j < subjectSequence.length(); j++)
        {
            // with probability 4p/5 replace the nucelotide randomly
            if (//enter with probability of 4p/5)
            {
               //do something
            }
            if (//enter with probability of p/5)
            {
                //do something
            }
          
        }
    }
}

I am expecting that the first if statement is entered with probability 4p/5 and the second if statement is entered with probability p/5

Jason
  • 36,170
  • 5
  • 26
  • 60
  • `rand()%(5*p) < 4*p` should do the trick –  Nov 07 '22 at 12:55
  • Have a look at std::rand(). There might be higher quality RNG if needed, but this should do for a simple simulation. Edit: Ah, ninja'd. – nick Nov 07 '22 at 12:55
  • 3
    `rand()` is biased. Don't use it if you need any quality of randomness at all. It seems that you are looking for [`std::bernoulli_distribution`](https://en.cppreference.com/w/cpp/numeric/random/bernoulli_distribution). – Yksisarvinen Nov 07 '22 at 12:58
  • 2
    Are you sure about you logic here ? If you don't enter the first case, only in 20% of cases do you then enter the second case. So in 100%*(1-0.8)*(1-0.2) nothing happens. – nick Nov 07 '22 at 12:58
  • Are you sure the probabilities should be 4p/5 and p/5, not 4/5 and 1/5? Is `p` a parameter for how big the probability should be or is it some random sample you are given to use for making the mutate/do-not-mutate selection? – Eric Postpischil Nov 07 '22 at 13:04
  • 1
    Careful! `void mutateSequence(vector> v, ...` passes `v` by value! Since you want to mutate `v`, you probably want to pass it by reference instead, i.e. `void mutateSequence(vector> &v, ...)`. – chi Nov 07 '22 at 13:06
  • @EricPostpischil the probabilities should be 4p/5 and p/5. The parameter p is passed in as a command line argument by the user. I know that it is odd but it is necessary for the goal of this simulation. – Daniel Lobo Nov 07 '22 at 13:08
  • Is the requirement that with probability `p` exactly one of these two branches occurs, and given that something occurs, with probability 4/5 it is the first? – Caleth Nov 07 '22 at 13:10
  • 1
    You have three cases to select from: 4p/5 enter the first `if`, 1p/5 enter the second `if`, 1−p enter neither. The prototypical way to do this is to divide the line segment from 0 to 1 into three intervals of lengths 4p/5, 1p/5, and 1−p. Then draw a random number in [0, 1). If it is less than `4*p/5`, it is in the first interval. Otherwise, if it is less than `p`, it is in the second interval (the interval from 4p/5 to p has length p/5). Otherwise, it is in the third interval. This can be done with a simple draw from a uniform distribution. I will let others answer on C++ features for this. – Eric Postpischil Nov 07 '22 at 13:18
  • btw in your code it is not either enter the first if or enter the second or neither. In your code both can be entered. Maybe you acutally want `else if` rather than `if`, not sure – 463035818_is_not_an_ai Nov 07 '22 at 13:36

1 Answers1

7

There's a very straightforward way to do this in modern C++. First we set it up:

#include <random>
std::random_device rd;
std::mt19937 gen(rd());
// p entered by user elsewhere
// give "true" 4p/5 of the time
std::bernoulli_distribution d1(4.0*p/5.0);
// give "true" 1p/5 of the time
std::bernoulli_distribution d2(1.0*p/5.0);

Then when we want to use it:

if (d1(gen)) {
    // replace nucleotide with 4p/5 probability
} else {
    // something else with 1 - 4p/5 probability
}

If instead, you want do one thing with probability 4p/5 and then, independently, another thing with probability 1p/5, that's also easily done:

if (d1(gen)) {
    // replace nucleotide with 4p/5 probability
} 
if (d2(gen)) {
    // something else with 1p/5 probability
}

See bernoulli_distribution for more detail.

Edward
  • 6,964
  • 2
  • 29
  • 55
  • The question asks for probabilities 4p/5 and 1p/5, not 4/5 and 1/5. That looks odd to me, and I entered a comment requesting clarification, but giving an answer that is not for the asked question and while the question is unclear risks votes down. – Eric Postpischil Nov 07 '22 at 13:08
  • And now the OP confirms they do mean 4p/5 and 1p/5 (and 1−4p/5-1p/5 = 1-p for doing nothing), making this answer wrong. So there are three options: 4p/5 enter the first `if`, 1p/5 enter the second `if`, 1−p enter neither. – Eric Postpischil Nov 07 '22 at 13:11
  • 2
    If `gen` is set up so `d(gen)` gives 4p/5 probability, then `!d(gen)` gives 1-4p/5 probability, not 1p/5. – Eric Postpischil Nov 07 '22 at 13:19
  • You say “very straightforward” but then [your code seeds the random generator incorrectly](https://stackoverflow.com/a/444614/1968). – Konrad Rudolph Nov 07 '22 at 13:26
  • Seeding the generator is a whole other question. For a treatment of that topic, I'd recommend https://www.pcg-random.org/posts/cpp-seeding-surprises.html – Edward Nov 07 '22 at 13:30
  • 1
    Yes, that’s my point. There’s very little “straightforward” about the `` standard header if you want to use it correctly, unfortunately. – Konrad Rudolph Nov 07 '22 at 13:31
  • It depends on the needs of the simulation. One could reasonably seed the generator with a single, fixed integer if the goal is to have a repeatable simulation. – Edward Nov 07 '22 at 13:36