0

I have the following dataframe with repeating values in 'nearest_beacon' column, but different distances in 'vms_distance' column:

nearest_beacon  vms_distance associated
2890231      0.421313        vms
2890231      0.215785        vms
2890231      0.104256        vms*
4548780      0.486456        vms
4548780      0.468065        vms
4548780      0.337609        vms
4548780      0.363601        vms
4548780      0.210566        vms
4548780      0.197327        vms*
4548780      0.285390        vms
4548780      0.216443        vms
1221421      0.441454        vms
1221421      0.337533        vms*

I want to determine the 'associated' column for the one row (*) in each 'nearest_beacon' value with the minimum value in 'vms_distance' column, set that 'associated' to 'vms', and the rest to 'no_vms'.

Expected Result:

nearest_beacon  vms_distance associated
2890231      0.421313        no_vms
2890231      0.215785        no_vms
2890231      0.104256        vms
4548780      0.486456        no_vms
4548780      0.468065        no_vms
4548780      0.337609        no_vms
4548780      0.363601        no_vms
4548780      0.210566        no_vms
4548780      0.197327        vms
4548780      0.285390        no_vms
4548780      0.216443        no_vms
1221421      0.441454        no_vms
1221421      0.337533        vms
smci
  • 32,567
  • 20
  • 113
  • 146
Suhendra
  • 1
  • 2
  • This is a duplicate of [Find row where values for column is maximal in a pandas DataFrame](https://stackoverflow.com/questions/10202570/find-row-where-values-for-column-is-maximal-in-a-pandas-dataframe), except using `idxmin` obviously, and a `groupby('nearest_beacon')`. Also, please state the question nearer the top, not scattered through the question body. – smci Aug 13 '19 at 04:14
  • Possible duplicate of [Find row where values for column is maximal in a pandas DataFrame](https://stackoverflow.com/questions/10202570/find-row-where-values-for-column-is-maximal-in-a-pandas-dataframe) – smci Aug 13 '19 at 04:15
  • I don't understand your request *"I hope it can be execute in a loop for other same issue."*. Functions like `idxmin()` and vectorization are more efficient in pandas; loops can and should generally be avoided. – smci Aug 13 '19 at 04:27
  • This answer is almost done, but it can't resolve for other id in 'nearest_beacon' column. I mean this answer could be executed in a loop for each 'nearest_beacon' value. https://gis.stackexchange.com/a/331708/133557 – Suhendra Aug 13 '19 at 05:22
  • Suhendra: stop talking about loops please. Use `df.groupby('nearest_beacon')...` chained to some other function, e.g. `idxmin()`. – smci Aug 13 '19 at 05:47

3 Answers3

0

Using groupby with idxmin then assign it back via loc

df.loc[df.groupby('nearest_beacon').vms_distance.idxmin(),'associated']='no vms'
BENY
  • 317,841
  • 20
  • 164
  • 234
  • with the code you posted I am getting 'no vms' for values with lowest value & the rest are 'vms'. Opposite of what is asked. – moys Aug 13 '19 at 03:44
  • Can it be reversed? I have tried it, buy the problem is 'no vms' just on minimum value of 'vms_distance' column. Same problem with @mohanys – Suhendra Aug 13 '19 at 03:49
0

Try this

#First we change the lowest items in each group to 'VMS'
df.loc[df.groupby('nearest_beacon').vms_distance.idxmin(),'associated']='VMS'

#Then we chagne the remaining items to 'No_Vms'
df.loc[df['associated'] != 'VMS', 'associated'] = 'No_vms'
moys
  • 7,747
  • 2
  • 11
  • 42
0

Do this:

df['associated'] = 'non_vms'
df.loc[df.groupby('nearest_beacon')['vms_distance'].idxmin(), 'associated'] = 'vms'
df

The output:

    nearest_beacon  vms_distance    associated
0   2890231 0.421313    non_vms
1   2890231 0.215785    non_vms
2   2890231 0.104256    vms
3   4548780 0.486456    non_vms
4   4548780 0.468065    non_vms
5   4548780 0.337609    non_vms
6   4548780 0.363601    non_vms
7   4548780 0.210566    non_vms
8   4548780 0.197327    vms
9   4548780 0.285390    non_vms
10  4548780 0.216443    non_vms
11  1221421 0.441454    non_vms
12  1221421 0.337533    vms
niuer
  • 1,589
  • 2
  • 11
  • 14