I have written application that is analyzing data and writing results in CSV file. It contains three columns: id, diff and count.
1. id is the id of the cycle - in theory the greater id, the lower diff should be
2. Diff is the sum of
(Estimator - RealValue)^2for each observation in the cycle
3 count is number of observation during cycle
For 15 different values of parameter K, I am generating CSV file with name: %K%.csv , where %K% is the used value. My total number of files is 15.
What I would like to do, is to write in R simple loop, that will be able to plot content of my files in order to let me decide, which value of K is the best (for which in general the diff is the lowest.
For single file I am doing something like
ggplot(data = data) + geom_point(aes(x= id, y=sqrt(diff/count)))
Does it make sense what I am trying to do ? Please note that statistics is completely not my domain, nor is R (but you probably could figure out this already).
Is there any better approach I can choose? And from theoretical point of view, am I doing what I am expecting to do?
I Would be very greateful for any comments, hints, critic and answers