I have a sample dataframe as shown below:
CNC[1:9,]
ID CNDIST1 CNDIST2
C1 0,0,136,1 0,0,4,2
C2 0,0,141,1 0,0,4,1
C3 6,8,126 0,0,4
C4 6,11,125 0,0,6
C5 0,0,141,0,0,0,0 0,0,3,0,1,0,1
C6 0,0,139,0,0 0,0,3,1,1
C7 0,0,141,0 0,0,4,2
C8 0,0,141,0 0,0,4,2
C9 31,44,61 2,2,2
The same dataframe using dput():
dput(CNC[1:9,])
structure(list(ID = structure(1:9, .Label = c("C1",
"C2", "C3", "C4",
"C5", "C6", "C7",
"C8", "C9"), class = "factor"),
CNDIST1 = structure(c(1L, 5L, 8L, 7L, 4L, 2L, 3L, 3L,
6L), .Label = c("0,0,136,1", "0,0,139,0,0", "0,0,141,0",
"0,0,141,0,0,0,0", "0,0,141,1", "31,44,61", "6,11,125", "6,8,126"
), class = "factor"), CNDIST2 = structure(c(5L, 4L, 3L,
6L, 1L, 2L, 5L, 5L, 7L), .Label = c("0,0,3,0,1,0,1", "0,0,3,1,1",
"0,0,4", "0,0,4,1", "0,0,4,2", "0,0,6", "2,2,2"), class = "factor")), .Names = c("ID",
"CNDIST1", "CNDIST2"), row.names = c(NA, 9L), class = "data.frame")
And i am using the below Rcode to do chisq.test. The values in column3 form the probablities vector 'p' of the same length of numeric vector 'x' from column2
read.table("report.dat2",header=T,sep="\t")->CNC
chi.pval=vector()
for(i in 1:nrow(CNC)){
as.numeric(unlist(strsplit(as.character(CNC$CNDIST1[i]),",")))->x
as.numeric(unlist(strsplit(as.character(CNC$CNDIST2[i]),",")))->p
chi.pval[i]<-chisq.test(x,p+0.001,rescale.p=T)$p.value ###add 0.001 to 'p' vector to remove '0'
}
CNC1<-cbind(CNC,chi.pval)
write.table(CNC1,'chi.test.txt',sep='\t',quote=F,row.names=F)
The code return the error:
Error in chisq.test(x, p + 0.001, rescale.p = T) :
'x' and 'y' must have at least 2 levels
In addition: Warning messages:
1: In chisq.test(x, p + 0.001, rescale.p = T) :
Chi-squared approximation may be incorrect
The code shows the error while doing chisq.test on some rows and exits. However it does the test for some rows of the dataframe. Does anyone offer a clue to find out what's happening to fix this?
dput(CNC[1:9,])) is missing some values in DIST1 and DIST2. Looks like something wierd is happening with duplicate values.