Complex comparison with multiple columns simultaneously

Question

I have a following pandas sample dataset:

    Dim1    Dim2    Dim3    Dim4
0   1   2   7   15
1   1   10  12  2
2   9   19  18  16
3   4   2   4   15
4   8   1   9   5
5   14  18  3   14
6   19  9   9   17

I want to make a complex comparison based on all 4 columns and generate a column called Domination_count. For every row, I want to calculate how many other rows the given one dominates. Domination is defined as "being better in one dimension, while not being worse in the others". A is better than B if the value of A is less than B.

The final result should become:

    Dim1    Dim2    Dim3    Dim4    Domination_count
0   1   2   7   15  2
1   1   10  12  2   1
2   9   19  18  16  0
3   4   2   4   15  2
4   8   1   9   5   2
5   14  18  3   14  0
6   19  9   9   17  0

Some explanation behind the final numbers:

the option 0 is better than option 2 and 6
the option 1 is better than option 2
option 2, 5,6 are better than no other option
option 3 and 4 are better than option 2, 6

I could not think of any code that allows me to compare multiple columns simultaneously. I found this approach which does not do the comparison simultaneously.

being better is defined as having less value, and not worse is having equal or less value — Emil Mirzayev, Sep 15 '20 at 11:53
You're probably going to end up with some sort of iterative (for loop) solution here. Try to write one and include it in your question. If the performance is not good, or you get stuck with some piece of the logic, let us know. — John Zwinck, Sep 15 '20 at 12:02
Correct me if im wrong but i think 3 and 4 are also better than 2 and so their domination count should be 2 and not 1. — hypnos, Sep 15 '20 at 12:26
@JohnZwinck, yes. I am working on a function which I could apply. I am trying to avoid the looping because my original dataset has 1mln rows — Emil Mirzayev, Sep 15 '20 at 12:37

B. Bogart · Accepted Answer · 2020-09-15T15:53:07.797

Improving on the answer:

My first answer worked if there were no equal rows. In the case of equal rows they would increment the domination count because they are not worse than the other rows.

This somewhat simpler solution takes care of that problem.

#create a dataframe with a duplicate row
df = pd.DataFrame([[1, 2, 7, 15],[1, 10,12,2],[9, 19,18,16],[4, 2, 4, 15],[8, 1, 9, 5],[14,18,3, 14],[19,9, 9, 17], [14,18,3, 14]], #[14,18,3, 14]
                 columns = ['Dim1','Dim2','Dim3','Dim4']
)
df2 = df.copy()

def domination(row,df):
    #filter for all rows where none of the columns are worse
    df = df[(row <= df).all(axis = 1)]
    #filter for rows where any column is better.
    df = df[(row < df).any(axis = 1)]
    return len(df)

df['Domination_count'] = df.apply(domination, args=[df], axis = 1)
df

This will correctly account for the criteria in the post and will not count the duplicate row in the domination column

    Dim1    Dim2    Dim3    Dim4    Domination_count
0   1      2       7       15       2
1   1      10      12      2        1
2   9      19      18      16       0
3   4      2       4       15       2
4   8      1       9       5        2
5   14     18      3       14       0
6   19     9       9       17       0
7   14     18      3       14       0

My previous solution counts the equal rows:

df2['Domination_count'] = df2.apply(lambda x: (x <= df2).all(axis=1).sum() -1, axis=1)
df2


    Dim1    Dim2    Dim3    Dim4    Domination_count
0   1       2       7       15      2
1   1       10      12      2       1
2   9       19      18      16      0
3   4       2       4       15      2
4   8       1       9       5       2
5   14      18      3       14      1
6   19      9       9       17      0
7   14      18      3       14      1

Original Solution I like this as a solution. It takes each row of the dataframe and compares each element it to all rows of the dataframe to see if that element is less than or equal to the other rows (not worse than). Then, it counts the rows where all of the elements are not worse than the other rows. This counts the current row which is never worse than itself so we subtract 1.

df['Domination_count'] = df.apply(lambda x: (x <= df).all(axis=1).sum() -1, axis=1)

The result is:

    Dim1    Dim2    Dim3    Dim4    Domination_count
0   1       2       7       15      2
1   1       10      12      2       1
2   9       19      18      16      0
3   4       2       4       15      2
4   8       1       9       5       2
5   14      18      3       14      0
6   19      9       9       17      0

This is exactly what I came up with :) instead of lambda function I was writing a full function for that. Thank you — Emil Mirzayev, Sep 15 '20 at 15:33
There was a problem with that answer though :). I've updated the answer to correct the problem of counting duplicate rows and now I think it is the correct answer to your question. — B. Bogart, Sep 15 '20 at 15:54

Always Right Never Left · Answer 2 · 2020-09-15T12:24:10.483

In one line using list comprehension:

df['Domination_count'] = [(df.loc[df.index!=row] - df.loc[row].values.squeeze() > 0).all(axis = 1).sum() for row in df.index]

Subtract each row from all remaining rows elementwise, then count rows with all positive values (meaning that each corresponding value in the row we subtracted was lower) in the resulting dataframe.

I may have gotten your definition of domination wrong, so perhaps you'll need to change strict positivity check for whatever you need.

score 0 · Answer 3 · edited Oct 15 '20 at 08:27

A simple iterative solution:

df['Domination_count']=0 #initialize column to zero
cols = df.columns[:-1]  # select all columns but the domination_count
for i in range(len(df.index)):   # loop through all the 4 columns
    for j in range(len(df.index)):
        if np.all(df.loc[i,cols]<=df.loc[j,cols]) and i!=j:  # for every ith value check if its smaller than the jth value given that i!=j
            df.loc[i,'Domination_count']+=1  #increment by 1

Complex comparison with multiple columns simultaneously

3 Answers3