2

I have a dataframe with multiple items and subitems in which I want to calculate the kappa per item, with the irr package using kappa2(). for every item there is a '_1' and '_2', so; item1_1 and item1_2. which represents the first and second moment of data measuring.

An example of the dataframe:

edit:


items <- data.frame(matrix(0, nrow = 51, ncol = 41))
# Set the column names for the first column and items columns
colnames(items) <- c("ID", paste(rep(paste0("item", 1:20), each = 2), c("_1", "_2"), sep = ""))
# Fill the ID column with values 1 to 51
items$ID <- 1:51
# Fill the item columns with random 0's and 1's
set.seed(123) # Set seed for reproducibility
items[, 2:41] <- matrix(sample(c(0, 1), size = 20 * 2 * 51, replace = TRUE), ncol = 40)
# Show the resulting data frame
items

I want a dataframe which would look something like this:

item    method      subjects    raters     irr.name   value    stat.name    statistic   p.value
1     Cohenskappa..    51         2          Kappa    0.536..      Z           3.897     0.023947
2     Cohenskappa..    51         2          Kappa    0.705..      Z           5.757     0.000002
3     Cohenskappa..    51         2          Kappa    0.890..      Z           6.447     0.072732
4     Cohenskappa..    51         2          Kappa    0.236..      Z           3.429     0.005636
..
20    Cohenskappa..    51         2          Kappa    0.686..      Z           4.897     0.000056

An option is creating single dataframes for every item and calculating the kappa, but this eventually result in wil result in more than 76 dataframes. There should be a more compact and quicker way.

  • you could work with the "dplyr" library and use the "mutate" function to add new columns with your kappa-value. – Eirik Fesker Apr 17 '23 at 19:17
  • @demi thanks I misunderstood the example data you had posted but I see now. Do you want the weighted or unweighted kappa? – SamR Apr 17 '23 at 21:15
  • If you edit the question with a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) of your data, the differences between that and what I generated should be clear, and we can work out what needs to change. – SamR Apr 18 '23 at 07:48
  • @demi thanks for editing the post with some data. I just tried the code in my answer with the data you posted and it doesn't generate an error - I get a data frame of results (with a couple of warnings about confidence intervals being truncated to 1/-1 but that's OK). Do you get the error with this sample data? If so, what does `packageVersion("psych")` produce? I am using version `2.2.9`. – SamR Apr 18 '23 at 15:25
  • @demi also I've just noticed that your code `kappa_list <- lapply(col_names_list, (item) get_kappa(item, items))` should be `kappa_list <- lapply(col_names_list, \(item) get_kappa(item, items))` (you're missing a backslash). But that would give you a syntax error rather than the error you posted so I assume that's not the issue. – SamR Apr 18 '23 at 15:26

1 Answers1

1

Here is a method using the psych package, which will handle NA values for you. I've created a helper function get_kappa() which formats the psych::cohen.kappa() output for each item into a data frame. We can then iterate over each item using lapply() and bind all the data frames together.

All the data wrangling is using base R. Generally I prefer to put the data in long form, which can be easier to manipulate with tidyverse or data.table functions. However, psych often takes the data in wide form so I've left it in this case. Base R functions like split() and lapply() are good at iterating over data frames in wide form.

library(psych)
get_kappa <- function(item, dat) {
    k <- cohen.kappa(dat[item])
    df <- data.frame(
        kappa_unweighted = k$confid["unweighted kappa", "estimate"],
        kappa_weighted = k$confid["weighted kappa", "estimate"],
        p = k$plevel
    )

    df
}

# item_1_1, item_1_2 etc.
col_names <- grep("^item", names(dat), v = T)
# list with two elements per item, e.g. c("item_1_1", "item_1_2")
col_names_list <- split(col_names, gsub("_[1|2]$", "", col_names))

kappa_list <- lapply(col_names_list, \(item)
    get_kappa(item, dat)
)
kappa_df <- do.call(rbind, kappa_list)
head(kappa_df)
#         kappa_unweighted kappa_weighted    p
# item_1         0.8646018      0.8377665 0.05
# item_10        0.7101006      0.4448613 0.05
# item_11        0.8678756      0.8470442 0.05
# item_12        0.7543783      0.6648501 0.05
# item_13        0.8647813      0.8316684 0.05
# item_14        0.8208169      0.8091446 0.05

Data

In general it's best to include some sample data in your question. In this case it was relatively straightforward so I generated some using this code:

set.seed(11111)
N <- 51
col_names <- paste0(rep(paste0("item_", 1:20), each = 2), c("_1", "_2"))
ground_truth <- lapply(1:20, \(.) sample(0:9, N, replace = TRUE)) |>
    setNames(paste0("item_", 1:20))

ratings <- lapply(col_names, \(x) {
    item_num <- gsub("_[1|2]$", "", x)
    item_truth <- ground_truth[[item_num]]
    to_change <- as.logical(sample(0:1, N, replace = TRUE, prob = c(0.9, 0.1)))
    item_truth[to_change] <- 0
    item_truth
})
dat <- data.frame(id = 1:N)
dat[col_names] <- ratings
M--
  • 25,431
  • 8
  • 61
  • 93
SamR
  • 8,826
  • 3
  • 11
  • 33