I am currently have a large dataset with quite a few missing values.
I'm trying to fill in these missing values by creating a random distribution with the data I have and sampling it. Eg create a random distribution then randomly choose a number from 0 to 1 and fill in the missing data with the corresponding value
I've read documentation for scipy and numpy. I think I'm looking for a continuous version of random.choice.
Company | Weight |
---|---|
a | 30 |
a | 45 |
a | 27 |
a | na |
a | 57 |
a | 57 |
a | na |
I'm trying to fill the NA columns by creating a continuous distribution using the data I already have.
I've tried using np.random.choice so far, ie: random.choice(30,45,27,57, [0.2,0.2,0.2,0.4])
However, this only returns back the specific arguements I input, however, I am trying to create a continuous model so that I can return any number between 27 and 57 with probability based on how many times a certain value appears in my previous data.
So in this case, numbers closer to 57 will be more likely to be chosen as it appears more frequently in my previous data.