How do I construct a comparison table from a dataframe?

Question

Given a pandas dataframe like this:

df = pd.DataFrame([['A', 'M1', 1], ['A', 'M2', 4], 
                   ['B', 'M1', 3], ['B', 'M2', 2],
                   ['C', 'M1', 5], ['C', 'M2', 0]],
                   columns=['task', 'method', 'value'])


  task method  value
0    A     M1      1
1    A     M2      4
2    B     M1      3
3    B     M2      2
4    C     M1      5
5    C     M2      0

I want to generate a comparison table like this:

method        M1   M2
method               
M1             0    2
M2             1    0

That is, the value at row r, column c is the number of tasks for which method r had a higher value than method c.

I tried adapting the solution here, but I can't figure out how to take into account the task.

How can I go about doing creating this table?

welcome here. In your real case, all the task have only 2 methods and it is always the same or let`s say for task C, you can have M3 instead of M2 and then you want also a column M3 and a row M3? — Ben.T, Aug 04 '22 at 19:00
The latter, I think. There are more than two methods in my actual case, and I'd like one row and one column per method. For every task, there is the same set of methods. — jwshi, Aug 04 '22 at 19:03

score 0 · Accepted Answer · answered Aug 04 '22 at 19:54

Here is a way to adapt the solution you provided in your question. First pivot the data to get a row per method. Then use the numpy array to be able to broadcast and compare. Aggregate along the task axis of your array. Finally build the dataframe to get as expected.

_df = df.pivot(index='method', columns='task', values='value')
print(_df)
# task    A  B  C
# method         
# M1      1  3  5
# M2      4  2  0

_arr = _df.to_numpy()
# now you can use the broadcasting method like in the link provided
# You need to aggregate (sum) along the task axis being the last one here
_res = (_arr<_arr[:,None]).sum(axis=-1)
print(_res)
# [[0 2]
#  [1 0]]

# after build your dataframe
res = pd.DataFrame(_res, index=_df.index, columns=_df.index)
print(res)
# method  M1  M2
# method        
# M1       0   2
# M2       1   0

How do I construct a comparison table from a dataframe?

1 Answers1