I am utilizing pandas to create a dataframe that appears as follows:
ratings = pandas.DataFrame({
'article_a':[1,1,0,0],
'article_b':[1,0,0,0],
'article_c':[1,0,0,0],
'article_d':[0,0,0,1],
'article_e':[0,0,0,1]
},index=['Alice','Bob','Carol','Dave'])
I would like to compute another matrix from this input one that will compare each row against all other rows. Let's assume for example the computation was a function to find the length of the intersection set, I'd like an output DataFrame with the len(intersection(Alice,Bob))
, len(intersection(Alice,Carol))
, len(intersection(Alice,Dave))
in the first row, with each row following that format against the others. Using this example input, the output matrix would be 4x3:
len(intersection(Alice,Bob)),len(intersection(Alice,Carol)),len(intersection(Alice,Dave))
len(intersection(Bob,Alice)),len(intersection(Bob,Carol)),len(intersection(Bob,Dave))
len(intersection(Carol,Alice)),len(intersection(Carol,Bob)),len(intersection(Carol,Dave))
len(intersection(Dave,Alice)),len(intersection(Dave,Bob)),len(intersection(Dave,Carol))
Is there a named method for this kind of function based computation in pandas? What would be the most efficient way to accomplish this?