pandas - apply function to current row against all other rows

Question

I am utilizing pandas to create a dataframe that appears as follows:

ratings = pandas.DataFrame({
    'article_a':[1,1,0,0],
    'article_b':[1,0,0,0],
    'article_c':[1,0,0,0],
    'article_d':[0,0,0,1],
    'article_e':[0,0,0,1]
},index=['Alice','Bob','Carol','Dave'])

I would like to compute another matrix from this input one that will compare each row against all other rows. Let's assume for example the computation was a function to find the length of the intersection set, I'd like an output DataFrame with the len(intersection(Alice,Bob)), len(intersection(Alice,Carol)), len(intersection(Alice,Dave)) in the first row, with each row following that format against the others. Using this example input, the output matrix would be 4x3:

len(intersection(Alice,Bob)),len(intersection(Alice,Carol)),len(intersection(Alice,Dave))
len(intersection(Bob,Alice)),len(intersection(Bob,Carol)),len(intersection(Bob,Dave))
len(intersection(Carol,Alice)),len(intersection(Carol,Bob)),len(intersection(Carol,Dave))
len(intersection(Dave,Alice)),len(intersection(Dave,Bob)),len(intersection(Dave,Carol))

Is there a named method for this kind of function based computation in pandas? What would be the most efficient way to accomplish this?

Dan Allan · Accepted Answer · 2013-06-04T18:25:05.970

7

I am not aware of a named method, but I have a one-liner.

In [21]: ratings.apply(lambda row: ratings.apply(
... lambda x: np.equal(row, x), 1).sum(1), 1)
Out[21]: 
       Alice  Bob  Carol  Dave
Alice      5    3      2     0
Bob        3    5      4     2
Carol      2    4      5     3
Dave       0    2      3     5

edited Jun 04 '13 at 18:25

answered Jun 04 '13 at 18:07

Dan Allan

34,073
6
70
63

Excellent answer! I've been searching for this all over. – maccaroo Nov 20 '20 at 03:53

Jeff · Answer 2 · 2013-06-04T18:23:17.073

@Dan Allan solution is 'right', here's a slightly different way of approaching the problem

In [26]: ratings
Out[26]: 
       article_a  article_b  article_c  article_d  article_e
Alice          1          1          1          0          0
Bob            1          0          0          0          0
Carol          0          0          0          0          0
Dave           0          0          0          1          1

In [27]: ratings.apply(lambda x: (ratings.T.sub(x,'index')).sum(),1)
Out[27]: 
       Alice  Bob  Carol  Dave
Alice      0   -2     -3    -1
Bob        2    0     -1     1
Carol      3    1      0     2
Dave       1   -1     -2     0

Interesting. I replaced my list comprehension with a slightly nicer nested apply. But this is even more compact. I wonder if ``np.equal`` can be worked into it.... — Dan Allan, Jun 04 '13 at 18:27

pandas - apply function to current row against all other rows

2 Answers2

Linked