1

I have a dataframe as below, where I do a groupby on Itr and Type:

  • How do I get the value in Start for the row in each group where Values column is max?
  • I only want the first max; there are multiple rows where Values is max (20.402). My expected answer should be 101 (because this is the time when Values is the max) when I do this after groupby of Itr 2 & Type 1.
  • I am unable to use .loc since I have done groupby.
  • I am unable to use apply function for this since it involves two columns.

Itr Type Start Values
2 1 101 20.402
2 1 102 20.402
2 1 103 20.399
2 1 104 20.399
2 1 105 20.399
2 1 106 20.383
2 1 107 20.383
2 1 108 20.383
2 1 109 20.383
2 1 110 20.383
2 1 111 20.36
2 1 112 20.36
2 1 113 20.36
2 1 114 20.36
2 1 115 20.36
2 1 116 20.36
2 1 117 20.36
2 1 118 20.36
2 1 119 20.36
2 1 120 20.36
3 1 121 20.348
3 1 122 20.348
3 1 123 20.348
3 1 124 20.348
3 1 125 20.348
3 1 126 20.34
3 1 127 20.34
3 1 128 20.34
3 1 129 20.34
3 1 130 20.34
3 1 131 20.337
3 1 132 20.337
3 1 133 20.337
3 1 134 20.337
3 1 135 20.337
3 1 136 20.342
3 2 121 20.058
3 2 122 20.058
3 2 123 20.058
3 2 124 20.058
3 2 125 20.043
3 2 126 20.043
3 2 127 20.043
3 2 128 20.043
3 2 129 20.043
3 2 130 20.035
3 2 131 20.035
3 2 132 20.035
3 2 133 20.035
3 2 134 20.035
3 2 135 20.021

As suggested I have put a simpler df & tried to make my requirement a bit more clearer.
Itr Type Time Val
2 3 101 3
2 3 102 4
2 3 103 5
2 3 104 6
2 3 105 6
2 3 106 5
2 3 107 1
1 2 101 11
1 2 102 12
1 2 103 13
1 2 104 18
1 2 105 15
1 2 106 10
4 5 101 21
4 5 102 22
4 5 103 27
4 5 104 29
4 5 105 25
4 5 106 26

I want the 'Time' and Max & Min 'Value' AFTER group by (that is within each group) in separate columns (one column for Max & one column for min)

moys
  • 7,747
  • 2
  • 11
  • 42
  • can you please post a desired output(preferably shorten the input example for readability) – anky Jul 08 '19 at 05:06
  • Hello, are you able to see the answer you put? It is not showing for me anymore. I am only see your latest comment where you ask to post the desired out put – moys Jul 08 '19 at 05:07
  • Yes i deleted it. Wasn't sure exactly how you desire your output to look like. :) – anky Jul 08 '19 at 05:08
  • I need the value is a column when the value is max in another column. This too AFTER groupby. The link that you put in does not answer this. Please remove the duplicate flag. – moys Jul 08 '19 at 05:18
  • **But your title asks for "Value in one column..." not "Rows with max value in given column..."**. We have so many duplicates on this, no matter what you're looking for it's a duplicate of one of the existing questions here; but we do need you to restate your question more clearly so we don't waste time in searching through lots of duplicates; I already spent 20+ mins. Please look through them yourself also. And please restate your expected output clearly. (Don't say "max value" if you mean "row with max value in col1", for example) – smci Jul 08 '19 at 05:43
  • ...then at the bottom you say something else again: *"I want the 'Time' and Max & Min 'Value' after groupby"* – smci Jul 08 '19 at 05:45
  • 2
    Anyway in pandas you can use multiple aggregations, by passing a list/dict of functions info `df.groupby(...).agg(your_list_of_functions_goes_here)`. So yes this absolutely is a duplicate. Please tell us very precisely your desired output. – smci Jul 08 '19 at 05:58
  • mohanys, ok it is in fact a duplicate of [Find row where values for column is maximal in a pandas DataFrame](https://stackoverflow.com/questions/10202570/find-row-where-values-for-column-is-maximal-in-a-pandas-dataframe), [Find maximum value of a column and return the corresponding row values using Pandas](https://stackoverflow.com/questions/15741759/find-maximum-value-of-a-column-and-return-the-corresponding-row-values-using-pan); if you had stated more clearly "I only need the first max" that's better. I had to put in a lot of effort to find you the right dupe target so you get good answers – smci Jul 16 '19 at 23:27

1 Answers1

3

I believe you need GroupBy.transform if need new columns in original DataFrame:

g = df.groupby(['Itr','Type'])

df['max_val'] = g['Val'].transform('max')
df['min_val'] = g['Val'].transform('min')

df['time_by_first_max_val'] = (df.set_index('Time')
                                 .groupby(['Itr','Type'])['Val'].transform('idxmax').values)
print (df)
    Itr  Type  Time  Val  max_val  min_val  time_by_first_max_val
0     2     3   101    3        6        1                    104
1     2     3   102    4        6        1                    104
2     2     3   103    5        6        1                    104
3     2     3   104    6        6        1                    104
4     2     3   105    6        6        1                    104
5     2     3   106    5        6        1                    104
6     2     3   107    1        6        1                    104
7     1     2   101   11       18       10                    104
8     1     2   102   12       18       10                    104
9     1     2   103   13       18       10                    104
10    1     2   104   18       18       10                    104
11    1     2   105   15       18       10                    104
12    1     2   106   10       18       10                    104
13    4     5   101   21       29       21                    104
14    4     5   102   22       29       21                    104
15    4     5   103   27       29       21                    104
16    4     5   104   29       29       21                    104
17    4     5   105   25       29       21                    104
18    4     5   106   26       29       21                    104

Or GroupBy.agg if need aggregate values:

df2 = (df.set_index('Time')
         .groupby(['Itr','Type'], sort=False)['Val']
         .agg([('max_val', 'max'),('min_val', 'min'),('time_by_first_max_val', 'idxmax')])
         .reset_index())
print (df2)

   Itr  Type  max_val  min_val  time_by_first_max_val
0    2     3        6        1                    104
1    1     2       18       10                    104
2    4     5       29       21                    104
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • I have just one question. When we do `df.set_index('Time')`. This does not actually change the index of the dataframe itself. This `set_index' is used only for computing the values in this particular line. Is my understanding correct? – moys Jul 08 '19 at 07:44
  • 1
    @mohanys - it is a bit trick - need values of `time` column per max of `val` - so created index by Time and get index value of maximal val. Last use `.values` for assign numpy array, not Series for prevent data alignmenet - original index values are different. – jezrael Jul 08 '19 at 07:59