2

I have already seen this question and I know numpy.random.choice, but my question is slightly different.

Given that, I have a dataset as below:

dict ={"Number of polyps":[10,8,3,1,2,6,13],
        "Right ":[3,2,3,1,0,3,3],
        "Left":[2,2,4,15,6,7,1] }

dt = pd.DataFrame(dict)

so, it is:

Number of polyps  Right   Left
            10       3     2
             8       2     2
             3       3     4
             1       1    15
             2       0     6
             6       3     7
            13       3     1

I need to refill the Right and Left column by below requirement

  1. Sum of Right and Left is equal to Number of polyps
  2. The values of Right and Left comes from weighted probability of their current value

For example, for a given row as below:

Number of polyps  Right   Left
            10       3     2

so, for this row it could be as below. Here 0.6= 3/(3+2) and 0.4= 2/(3+2):

nr = np.random.choice(["Right","Left"],size=10, replace=True,p=[0.6,0.4])
rightCount = list.count('Right')
leftCount = list.count('Left')
print(rightCount)
print(leftCount)

After updating this row will be:

Number of polyps  Right   Left
            10       3     7

The problem is, I must do it for all the rows in dataset, but I am not sure how to do it!

Marius Mucenicu
  • 1,685
  • 2
  • 16
  • 25
Jeff
  • 7,767
  • 28
  • 85
  • 138

1 Answers1

5

You're essentially drawing from the binomial distribution. It is implemented in NumPy as numpy.random.binomial:

>>> dt["Right"] = np.random.binomial(dt["Number of polyps"], dt["Right"]/(dt["Right"]+dt["Left"]))
>>> dt["Left"] = dt["Number of polyps"] - dt["Right"]

Here, for each row we perform dt["Number of polyps"] binary-choice trials, with each trial selecting Right with probability dt["Right"]/(dt["Right"]+dt["Left"]) and Left otherwise.

NPE
  • 486,780
  • 108
  • 951
  • 1,012
  • But `Binomial distribution` is slightly different from `Uniform`? Right? @NPE – Jeff Oct 01 '19 at 11:05
  • @Jeff: There's nothing uniform about your distribution. (Uniform distribution is continuous, whereas your case is discrete.) – NPE Oct 01 '19 at 19:19