5

I'm trying to solve an exercise from the rosalind project, but keep making some mistake apparently. The full text is available here, but my shorter abstract interpretation and attempt is as follows. Please help me find what am I doing wrong:

We have 3 groups of items: AA, Aa, aa. We start with 1 in Aa and do k iterations of generating new items. In every iteration every item in group:

  • Aa could produce: AA (25%), Aa (50%), aa (25%)
  • AA could produce: AA (50%), Aa (50%)
  • aa could produce: aa (50%), Aa (50%)

As a result of iteration we count expected number of items for each group, assuming we generate 2 new items from each one in the previous iteration. So the we end up with:

  • 0th iter: AA: 0, Aa: 1, aa: 0
  • 1st iter: AA: .5, Aa: 1, aa: .5
  • 2nd iter: AA: 1, Aa: 2, aa: 1
  • etc. - proportions stay at 1:2:1 between groups

The sum of expected values / population on each iteration is 2^iteration and the probability of an item being in group Aa is always 50%.

So far I hope I'm right, but what we're actually after is: what are the chances of having at least N items that are in group Aa both times if we repeat the experiment twice. (should be equivalent to: what are the chances of having at least N items in group AaBb if we extend the list of groups to AABB, AABb, .... from the original question)

So the probability of item being in Aa is 50%, population sum of expected values from iteration (or 2^iteration), and throwing that at scipy using the test data (k=2, N=1), we get this for just at least one item in group Aa:

In [75]: bin = scipy.stats.binom(4, .5)
In [76]: sum(b.pmf(x) for x in range(1, 4+1))
Out[76]: 0.93750000000000022

and this for at least one item if we have two sets of groups, so AaBb:

In [77]: sum(b.pmf(x) for x in range(1, 4+1))**2
Out[77]: 0.87890625000000044

Which is completely different from the answer in the original question: 0.684

Where did I make a mistake? (if possible please only point out the mistake, rather than give a solution, so that there are no spoilers left for people trying to solve it on their own)

viraptor
  • 33,322
  • 10
  • 107
  • 191
  • Can you add a pastebin? of your code? – classicjonesynz Feb 08 '13 at 02:06
  • @Killrawr yes. it's fugly, since it's a one-off script, but it does what I describe: http://pastebin.com/uJz9Gb2Q – viraptor Feb 08 '13 at 02:11
  • I don't see where AA and aa come from; is this supposed to work with the way humans reproduce, i.e. one set of chromosomes from each parent? That would give as the possibilities AB, Ab, aB and ab as chromosomes from the mate, no? – G. Bach Feb 08 '13 at 02:24
  • @G.Bach They're alleles. you can expand the "Mendel's Second Law" section in the original question to get the details. – viraptor Feb 08 '13 at 02:26
  • Ah I see; I'm afraid I can't really help you, but a little drawing and some math should help figure it out. – G. Bach Feb 08 '13 at 02:33
  • AA is the genotype correct? @G. Bach some good information over on 23andme on this sort of data by the way. – classicjonesynz Feb 08 '13 at 03:22

1 Answers1

1

I first followed your example and thought it seemed to make sense, but after a while I found where the problem was.

Here is a pointer to your mistake:

You have calculated the probability of getting at least one Aa-- in the second generetion and at least one --Bb. But this is not enough to find out if there is at least one AaBb in the second generation, the Aa-- and --Bb have to coincide.

Consider for example the following second generation: aaBb, AABb, Aabb, AaBB All individuals have either Aa-- or --Bb but there are no AaBb in the generation.

user1884905
  • 3,137
  • 1
  • 22
  • 22
  • Ok, but considering p1 == probability of finding at least one 'Aa' and p2 == probability of finding at least one 'Bb', we know that p1==p2 and since they're independent, finding at least one 'AaBb' is p1*p2. Finding at least one 'Aa-' and at least one '-Bb' would be 1/(1/p1+1/p2) instead, right? I expected the p**2 to account for the coinciding geotypes. – viraptor Feb 08 '13 at 09:55
  • 1
    @viraptor Yes, they are independent, so the probability of finding `AaBb` is `p1*p2` just as you say. This is the value you should use as `p` in your binomial distribution. (It is not the same thing as squaring the final result.) – user1884905 Feb 08 '13 at 10:16