1
df = pd.DataFrame([['SAM', 23, 1],
                   ['SAM', 23, 2],
                   ['SAM', 23, 1],
                   ['SAM', 23, 3],
                   ['BILL', 36, 1],
                   ['BILL', 36, 2],
                   ['BILL', 36, 3],
                   ['BILL', 36, 1],
                   ['JIMMY', 33, 4],
                   ['JIMMY', 33, 2],
                   ['JIMMY', 33, 2],
                   ['JIMMY', 33, 3],
                   ['CARTER', 25, 3],
                   ['CARTER', 25, 4],
                   ['CARTER', 25, 5],
                   ['CARTER', 25, 4],
                   ['GRACE', 27, 4],
                   ['GRACE', 27, 5],
                   ['GRACE', 27, 6],
                   ['TOMMY', 32, 7]])
df.columns = ['A', 'B', 'C']

I need to keep in dataframe all rows with minimum values of 'C' column grouped by 'A' column and remain B the same. There is almost same theme here but if i use

df.loc[df.groupby('A').C.idxmin()]

Only one minimum row remains, and i need all of them. Expected result:

image of expected result

smci
  • 32,567
  • 20
  • 113
  • 146
Johnny
  • 59
  • 5
  • We can write code that does this, but you're asking to intentionally keep duplicate rows, which has almost no value. Can you show any legitimate use-case where we would ever want to do this? – smci Jul 16 '21 at 00:50
  • Also, most solutions will not preserve the index, they'll overwrite it with default 0,1,2..., (but your original dataframe doesn't have a non-default index, so you won't notice). But if you do `df.index = list(string.ascii_lowercase)[:20]` you'll see. – smci Jul 16 '21 at 00:53
  • Yes. in real case (this is an example) i have a dataframe with 500k+ rows. and column B values are not the same, so rows are not duplicates. Im sorry for making you think they are the same in real life. just copy-pasted them. – Johnny Jul 19 '21 at 08:51
  • I'm saying: if you want solutions that preserve the (non-default) index, edit your question to say so and provide a data example that has a (non-default) index. You've already accepted a solution that doesn't. – smci Jul 19 '21 at 11:35

1 Answers1

5

Let's try with groupby.transform to get the minimum value of C per group and compare with df['C'] and keep those C values that equal the minimum:

df.loc[df.groupby('A')['C'].transform('min').eq(df['C'])].reset_index(drop=True)
        A   B  C
0     SAM  23  1
1     SAM  23  1
2    BILL  36  1
3    BILL  36  1
4   JIMMY  33  2
5   JIMMY  33  2
6  CARTER  25  3
7   GRACE  27  4
8   TOMMY  32  7
Henry Ecker
  • 34,399
  • 18
  • 41
  • 57