Pairwise Cohen's Kappa of rows in DataFrame in Pandas (python)

Question

I'd greatly appreciate some help on this. I'm using jupyter notebook.

I have a dataframe where I want calculate the interrater reliability. I want to compare them pairwise by the value of the ID column (all IDs have a frequency of 2, one for each coder). All ID values represent different articles, so I do not want to compare them all together, but more take the average of the interrater reliability of each pair (and potentially also for each column).

N.  ID.     A.  B.      
0   8818313 Yes Yes     1.0 1.0 1.0 1.0 1.0 1.0
1   8818313 Yes No      0.0 1.0 0.0 0.0 1.0 1.0 
2   8820105 No  Yes     0.0 1.0 1.0 1.0 1.0 1.0 
3   8820106 No  No      0.0 0.0 0.0 1.0 0.0 0.0

I've been able to find some instructions of the cohen's k, but not of how to do this pairwise by value in the ID column.

Does anyone know how to go about this?

In you example, only ID 8818313 has two coders. Is this expected? Should the IDs with only one coder be dropped? — mozway, Jul 13 '21 at 09:12
@Anna please accept it as final answer if it solved your problem. Thank you. — quest, Jul 13 '21 at 09:35

score 2 · Accepted Answer · answered Jul 13 '21 at 09:13

2

Here is how I will approach it:

from io import StringIO
from sklearn.metrics import cohen_kappa_score

df = pd.read_csv(StringIO("""
N,ID,A,B,Nums
0,   8818313, Yes, Yes,1.0 1.0 1.0 1.0 1.0 1.0
1,   8818313, Yes, No,0.0 1.0 0.0 0.0 1.0 1.0 
2,   8820105, No,  Yes,0.0 1.0 1.0 1.0 1.0 1.0 
3,   8820105, No,  No,0.0 0.0 0.0 1.0 0.0 0.0 """))


def kappa(df):
    nums1 = [float(num) for num in df.Nums.iloc[0].split(' ') if num]
    nums2 = [float(num) for num in df.Nums.iloc[1].split(' ') if num]
    return cohen_kappa_score(nums1, nums2)

df.groupby('ID').apply(kappa)

This will generate:

ID
8818313    0.000000
8820105    0.076923
dtype: float64

answered Jul 13 '21 at 09:13

quest

3,576
2
16
26

Sorry, just one more thing. In the def kappa(df) function, what does the "num" stand for/how does it work? I keep getting the error message "'DataFrame' object has no attribute 'Nums' " – Anna Louise Todsen Jul 14 '21 at 07:43
1

Sorry for seeing the message late. `df.Nums.iloc[0]` gets the string which for example is like `1.0 1.0 1.0 1.0 1.0 1.0` . `split(' ')` converts it into a list of strings like so: `[ '1.0', '1.0', '1.0', '1.0', '1.0', '1.0']` . Then I use list comprehension to convert all the strings to floats, if they are not empty. `num` is just a variable used in list comprehension. – quest Jul 14 '21 at 12:46

Pairwise Cohen's Kappa of rows in DataFrame in Pandas (python)

1 Answers1