I am trying to calculate statistical parameters phi coefficient, Cramer's V and Contigency Coefficient using Rpy module of python. In R I am able to do so but I am at my wits end in my attempts to replicate the same in python
Library(vcd)
data <- read.csv("test.csv")
assocstats(table(data$var_4, data$target)
Output
X^2 df P(> X^2)
Likelihood Ratio 113.28 1 0
Pearson 112.51 1 0
Phi-Coefficient : 0.15
Contingency Coeff.: 0.148
Cramer's V : 0.15
Implementation in python
from Rpy import r
# Already connected with mysql
q="Select var_4 , target from test"
cur.execute(q)
data=cur.fetchall()
ls1=[]
ls2=[]
for i in range(len(data)):
ls1.append(data[i][0])
ls2.append(data[i][1])
rpy.r.library("vcd")
rpy.r.assocstats(rpy.r.table(ls1,ls2))
error :
Traceback (most recent call last):
File "<pyshell#14>", line 1, in <module>
rpy.r.assocstats(rpy.r.table(ls1,ls2))
RPy_RException: Error in sort.list(y) : 'x' must be atomic for 'sort.list'
Have you called 'sort' on a list?
The other way I am trying is to calculate the phi sq from scipy module and then use the mathematical formula to calculate cramer's v etc. But I intend to use Rpy heavily in my project going forward.I would really appreciate I you can point out the problem in above approach . I think I am not able to pass on the input in proper format in the formula Thanks in Advance