0

Im trying to use the confusion matrix from library(carot) to determine if the which column is more accurate and Im running into trouble. Im trying to see if column df$G5 is more accurate than df$G9 when compared to df$GE. The methods I've tried in the past arent working and Im not sure how to proceed with the matrix. The main error I keep running into is "Error: data and reference should be factors with the same levels".

df <- 
  C      P           I   R A    S  GE  G5 A5   G9 A9          AF
1 8 163302 rs141069412 CAT C NONE 1/1 1/1  1 <NA> NA 9.33843e-01
2 8 163366  rs34810249   T C NONE 0/1 0/1  1  1/0  1 2.07735e-01
3 8 163370   rs7844253   C G NONE 1/1 1/1  1  1/1  1 9.28438e-01
4 8 163387   rs3008286   C T NONE 0/1 0/1  1  0/1  1 7.17963e-01
5 8 163432   rs3008285   A G NONE 0/1 0/0  0 <NA> NA 1.02935e-01
6 8 163438   rs7844396   C T NONE 1/1 1/1  1  1/1  1 9.28281e-01
Robert S
  • 49
  • 5
  • It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. Show the code you are actually running so we can run it too. Have you fit some sort of model? What kind of confusion matrix are you trying to create exactly? – MrFlick Dec 20 '21 at 03:13
  • @MrFlick what im trying to do is really see which one (df$g5 of df$G9) is more accurate when compared to df$GE. the only way I know in the past to find accuracy was through a confusion matrix (but its been a while). – Robert S Dec 20 '21 at 03:24
  • 1
    @Robert S Could you explain what the values in columns G5, G9, and GE mean (e.g., `1/1` or `0/1`)? And what do you precisely mean by the word "accurate"? – yh6 Dec 20 '21 at 04:38
  • @MrFlick GE is the original and G5 and G9 are attempts to replicate GE. think of 1/0 and 1/1 as characters or letters. Im trying to see who (G5 or G9) is more accurate or similar to GE – Robert S Dec 20 '21 at 15:28
  • Thank you. It seems to me that you cannot use "accuracy" because you can define TRUE/FALSE (e.g., `GE[1]==G5[1]`) but you cannot define positive/negative. How about using similarity indices (e.g., Jaccard similarity)? – yh6 Dec 21 '21 at 09:29

0 Answers0