How to get the index with the minimum value in a column avoiding duplicate selection

Question

I have the following dataframe:

import pandas as pd
pd.DataFrame({'index': {0: 'x0',
  1: 'x1',
  2: 'x2',
  3: 'x3',
  4: 'x4',
  5: 'x5',
  6: 'x6',
  7: 'x7',
  8: 'x8',
  9: 'x9',
  10: 'x10'},
 'distances_0': {0: 0.42394711275317537,
  1: 0.40400179114038315,
  2: 0.4077213959237454,
  3: 0.3921048592156785,
  4: 0.25293154279281627,
  5: 0.2985576890173001,
  6: 0.0,
  7: 0.32563550923886675,
  8: 0.33341592647322754,
  9: 0.30653189426783256,
  10: 0.31749957588191197},
 'distances_1': {0: 0.06684300576184829,
  1: 0.04524728117549289,
  2: 0.04896118088709522,
  3: 0.03557204741075342,
  4: 0.10588973399963886,
  5: 0.06178330590643222,
  6: 0.0001,
  7: 0.6821440376099591,
  8: 0.027074111335967314,
  9: 0.6638424898747833,
  10: 0.674718181953208},
 'distances_2': {0: 0.7373816871931514,
  1: 0.7184619375104593,
  2: 0.7225072199147892,
  3: 0.7075191710741303,
  4: 0.5679436864793461,
  5: 0.6142446533143044,
  6: 0.31652743219529056,
  7: 0.010859948083988706,
  8: 0.6475070638933254,
  9: 0.010567926115431175,
  10: 0.0027932480510772413}}

)

index   distances_0 distances_1 distances_2
0   x0  0.423947    0.066843    0.737382
1   x1  0.404002    0.045247    0.718462
2   x2  0.407721    0.048961    0.722507
3   x3  0.392105    0.035572    0.707519
4   x4  0.252932    0.105890    0.567944
5   x5  0.298558    0.061783    0.614245
6   x6  0.000000    0.000100    0.316527
7   x7  0.325636    0.682144    0.010860
8   x8  0.333416    0.027074    0.647507
9   x9  0.306532    0.663842    0.010568
10  x10 0.317500    0.674718    0.002793

I would like to get, for every distances_ column, the index with the minimum value.

The requirement is that each distances_ column, should have a different index: For instance index=="x6" has the minimum value for both distances_0 and distances_1, columns, but it should be chosen only for one (and in this case it should be chosen for distances_0, since 0.000000 < 0.000100).

How could I do that ?

This returns a vector that looks like this: 0 distances_1 1 distances_1 2 distances_1 3 distances_1 4 distances_1 5 distances_1 6 distances_0 7 distances_2 8 distances_1 9 distances_2 10 distances_2 How can I interpret this ? — quant, Jun 02 '23 at 09:49
In my case, I basically need 3 (unique) values of the `index` column in the `df` — quant, Jun 02 '23 at 09:50
This returns distances_0 x6 distances_1 x6 distances_2 x10, which is not the wanted answer, since `x6` is selected for both `distances_0` & `distances_1`. I would like 3 unique selections — quant, Jun 02 '23 at 09:53
that is why I believe my question is different than the "duplicated" one — quant, Jun 02 '23 at 09:54

jezrael · Accepted Answer · 2023-06-02T10:35:00.703

1

Use Series.idxmin with filter out existing values in ouput list:

df1 = df.set_index('index')

out = []
for c in df1.columns:
    out.append(df1.loc[~df1.index.isin(out), c].idxmin())
print (out)
['x6', 'x8', 'x10']

edited Jun 02 '23 at 10:35

answered Jun 02 '23 at 10:03

jezrael

822,522
95
1,334
1,252

This solution works in this case. But if you swap the values df['x6', 'distances_0'] and df['x6', 'distances_1'] it doesnt. Any ideas why ? – quant Jun 02 '23 at 10:09
@quant - Can you check now? – jezrael Jun 02 '23 at 10:36

score 0 · Answer 2 · answered Jun 02 '23 at 12:45

You can melt and sort the values, the drop_duplicates on the index to ensure having unique ones with priority on the lower values, finally select the first value per original column:

out = (df.melt('index', var_name='col').sort_values(by='value')
         .drop_duplicates(subset='index')
         .drop_duplicates(subset='col')
         .set_index('col')['index']
      )

Or:

out = (df.melt('index').sort_values(by='value')
         .drop_duplicates(subset='index')
         .groupby('variable')['index'].first()
      )

Output:

col
distances_0     x6
distances_2    x10
distances_1     x8
Name: index, dtype: object

This solution works in this case. But if you swap the values df['x6', 'distances_0'] and df['x6', 'distances_1'] it doesnt — quant, Jun 02 '23 at 15:33

How to get the index with the minimum value in a column avoiding duplicate selection

2 Answers2