0

I have a data set (mydata) of 2 variables (both are percentages). I need to find the threshold for the data set. The data set link

I want to use R to get the normal distribution for this data set so that I can have a statistically valid justification to choose a cut-off for this data set (e.g. (50%,70%)).

Thanks!

Sophia Wu
  • 61
  • 1
  • 9
  • consider editing your question so that it is a "reproducible example". In order to help, people need to see some sample data and what code you have been trying... – Nate Aug 02 '16 at 14:07
  • Sorry. I'm new to stackoverflow so not sure how to present the table. I just edited it. I've been trying using sigma<-var(mydata) means<-colmeans(mydata) simulation<-mvrnorm(n=1000, means, sigma). But it does not show anything. – Sophia Wu Aug 02 '16 at 14:11
  • 1
    `dput(your_dataframe)` is helpful for putting up a sample of your data. no need to share everything just what is the minimum to redproduce your problem. Here is good link for asking a great question on SO:http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example. Welcome to SO – Nate Aug 02 '16 at 14:15
  • It's unclear what method you propose for setting a threshold and what sort of threshold should be set. Try offering a small set of data and "walk" us through setting a threshold. If you don't know how to do that it's not yet a programming question. – IRTFM Aug 02 '16 at 15:41
  • I'm actually new to R. I need to find a solution for my work. It would be great if you can provide a shortcut for me so I don't have to learn from scratch. Thanks! – Sophia Wu Aug 02 '16 at 16:17
  • 1
    We might be able to help if you explained what you wanted to do in more detail. – IRTFM Aug 02 '16 at 17:05
  • These two columns represent percentages of overlapped data in 2 forms. The 1st column is calculated as overlap / # of data in form1. The second column is calculated as overlap / # of data in form2. I want to find the threshold (e.g. 50%, 90%) and only further analyze forms with pair of match percentages above this threshold. I don't know if there is any way that i can statistically justify my selection of threshold. – Sophia Wu Aug 03 '16 at 16:31

0 Answers0