0

How can I find the probability of occurrence for the pandas df below? I am trying to find the probability of a beer being associated to one store over others? My current event time is one day.

I have a dataframe like below:

eventtime                   name         src_store    
January 14, 2018 4:57:35    budlight     NaN
January 14, 2018 4:51:31    coors        5-119
January 14, 2018 4:31:32    pabst        NaN
January 14, 2018 4:57:31    budlight     5-118
January 14, 2018 4:58:21    coors        5-119
January 14, 2018 4:57:37    NaN          5-120
January 14, 2018 4:18:31    budlight     5-118
January 14, 2018 4:57:31    coors        5-119
January 14, 2018 4:57:52    NaN          5-120

Some code to give me a comparison matrix:

pd.crosstab(df.name, df.src_store)

    src_store  5-118  5-119  5-120  NONE
name                                
NONE           0      0      2     0
budlight       2      0      0     1
coors          0      3      0     0
pabst          0      0      0     1

Trying to get the pvalues from this:

Name with    src_store
Name without src_store
src_store with    name
src_store without name

Overall goal is to find the probability a beer is correlated to a specific src_store.

Expected output (NOT The actual p_values):

eventtime                   name         src_store    p_value
January 14, 2018 4:57:35    budlight     NaN          0.01
January 14, 2018 4:51:31    coors        5-119        0.02
January 14, 2018 4:31:32    pabst        NaN          0
January 14, 2018 4:57:31    budlight     5-118        0.002
January 14, 2018 4:58:21    coors        5-119        0.004
January 14, 2018 4:57:37    NaN          5-120        0.005
January 14, 2018 4:18:31    budlight     5-118        0.006
January 14, 2018 4:57:31    coors        5-119        0.007
January 14, 2018 4:57:52    NaN          5-120        0.008
johnnyb
  • 1,745
  • 3
  • 17
  • 47
  • https://stackoverflow.com/questions/32732582/chi-square-p-value-matrix-in-r This is something similar but in the R language. – johnnyb Jan 14 '18 at 22:13
  • chi2_contingency(pd.crosstab(df.name, df.src_store) This shows the expected values for each series in an ndarray with one p value according to the docs here. https://docs.scipy.org/doc/scipy-0.15.1/reference/generated/scipy.stats.chi2_contingency.html – johnnyb Jan 15 '18 at 00:11

0 Answers0