3

I am using this code to select the smallest row in a column of a given df (got this appraoch from here):

data = pd.DataFrame({'A': [1,1,1,2,2,2], 'B':[4,5,2,7,4,6], 'C':[3,4,10,2,4,6]})
min_value = data.groupby('A').B.min()
data = data.merge(min_value, on='A',suffixes=('', '_min'))
data = data[data.B==data.B_min].drop('B_min', axis=1)

I would like to modify this such that I get the 2nd (or nth) lowest value for that column.

Niccola Tartaglia
  • 1,537
  • 2
  • 26
  • 40

3 Answers3

1

You can find the nth lowest B per A and filter data.

data = pd.DataFrame({'A': [1,1,1,2,2,2], 'B':[4,5,2,7,4,6], 'C':[3,4,10,2,4,6]})
# sort data
data = data.sort_values(by=['A','B'])
# transform the 2nd lowest (n=1) for the row and filter
data = data[data['B'] == data.groupby('A')['B'].transform('nth', 1)]
print(data)
   A  B  C
0  1  4  3
5  2  6  6

You can select any nth by passing the rank to transform as arg.

cottontail
  • 10,268
  • 18
  • 50
  • 51
0

Try:

print(
    data.groupby("A", as_index=False).apply(
        lambda x: x.sort_values(by="B").iloc[1]
    )
)

Prints:

   A  B  C
0  1  4  3
1  2  6  6
Andrej Kesely
  • 168,389
  • 15
  • 48
  • 91
0

If your data is large, you could avoid sorting the data(which can be expensive), and instead use a combination of idxmin ( as shown in the solution you referenced) and nsmallest:

grouper = data.groupby('A').B
# get the minimum
minimum = grouper.idxmin()
# get the nsmallest rows (2 in this case)
smallest_2 = grouper.nsmallest(2).index.droplevel(0)
# alternative is smallest_2.difference(minimum)
smallest_2 = smallest_2[~smallest_2.isin(minimum)]
data.loc[smallest_2]

   A  B  C
0  1  4  3
5  2  6  6
sammywemmy
  • 27,093
  • 4
  • 17
  • 31