How do I calculate the probability of every value in a dataframe column quickly in Python？

Question

I want to calculate the probability of all the data in a column dataframe according to its own distribution.For example,my data like this:

And the output I expect like this：

    data       pro
0      1  0.155015
1      1  0.155015
2      2  0.181213
3      3  0.157379
4      2  0.181213
5      2  0.181213
6      7  0.048717
7      8  0.044892
8      3  0.157379
9      4  0.106164
10     1  0.155015

I also refer to another question(How to compute the probability ...) and get an example of the above.My code is as follows：

import scipy.stats
samples = [1,1,2,3,2,2,7,8,3,4,1]
samples = pd.DataFrame(samples,columns=['data'])
print(samples)
kde = scipy.stats.gaussian_kde(samples['data'].tolist())
samples['pro'] = kde.pdf(samples['data'].tolist())
print(samples)

But what I can't stand is that if my column is too long, it makes the operation slow.Is there a better way to do it in pandas?Thanks in advance.

score 6 · Answer 1 · answered May 31 '17 at 07:17

6

Its own distribution does not mean kde. You can use value_counts with normalize=True

df.assign(pro=df.data.map(df.data.value_counts(normalize=True)))

    data       pro
0      1  0.272727
1      1  0.272727
2      2  0.272727
3      3  0.181818
4      2  0.272727
5      2  0.272727
6      7  0.090909
7      8  0.090909
8      3  0.181818
9      4  0.090909
10     1  0.272727

answered May 31 '17 at 07:17

piRSquared

285,575
57
475
624

First of all, thank you for your answer. Secondly, I would like to ask if I can get the probability from the probability density function. If my number does not belong to the above value, how can I get the probability. For example,how can I get the probability of a value is 1.5 based on the distribution of that column? – giser_yugang May 31 '17 at 07:33

How do I calculate the probability of every value in a dataframe column quickly in Python？

1 Answers1