Metric for evaluating agreement at inter-rater reliability for a single subject by multiple raters

Question

I'm making a rating survey in R (Shiny) and I'm tryng to find a metric that can evaluate the agreement but for only one of the "questions" in the survey. The ratings range from 1 to 5. There is multiple raters and each rater rates a set of 10 questions according to the ratings.

I've used Fleiss Kappa and Krippendorff Alpha for the whole set of questions and raters and it works but when evaluating each question separately these metrics give negative value. I tried calculating them by hand (formulas) and I still get the same results so I guess that they don't work for a small sample of subjects (in this case a sample of 1).

I've looked at other metrics like rwg in the multilevel package but thus far I can't seem to make it work. According to r documentation:

rwg(x, grpid, ranvar=2)

Where:

x = A vector representing the construct on which to estimate agreement.

grpid = A vector identifying the groups from which x originated.

Can someone explain me what the rwg function expects from me?

If someone know some other agreement metric that might work better please let me know.

Thanks.

This doesn't appear to be a specific programming question that's appropriate for Stack Overflow. If you have general questions about choosing an appropriate statistical method, then you should ask such questions over at [stats.se] instead. You are more likely to get better answers there. The `rwg` help page has an example of its use. What exactly are you passing it? It's easier to help you if you provide a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input we can use to run the code to see what's going on. — MrFlick, Jun 12 '20 at 01:14
I have a dataframe with subjects as columns (subject 1, subject 2, etc) and raters as rows (rater 1, rater 2, etc). I tried passing a column, like subject 1 (vector), as x and the whole datafrae as grpid. — Red Shepard, Jun 12 '20 at 01:39

Metric for evaluating agreement at inter-rater reliability for a single subject by multiple raters

0 Answers0