0

This is my first time asking a question. I have a dataframe that looks like below:

import pandas as pd
data = [['AK', 'Co',2957],
    ['AK', 'Ot', 15],
    ['AK','Petr', 86848],
    ['AL', 'Co',167],
    ['AL', 'Ot', 10592],
    ['AL',  'Petr',1667]]
my_df = pd.DataFrame(data, columns = ['State', 'Energy', 'Elec']) 
print(my_df)

I need to find the maximum and minimum values of the third column based on the first two columns. I did browse through a few stackoverflow questions but couldn't find the right way to solve this. My output should look like below:

data = [['AK','Ot', 15],
['AK','Petr',86848],
['AL','Co',167],
['AL','Ot', 10592]]

my_df = pd.DataFrame(data, columns = ['State', 'Energy', 'Elec']) 
print(my_df)

Note: Please let me know where I am lagging before leaving a negative marking on the question

This link helped me: Python pandas dataframe: find max for each unique values of an another column

DeepMeh
  • 23
  • 3
  • Why is this missing: `['AL', 'Petr',1667]`? – Dani Mesejo Dec 03 '20 at 17:12
  • @DaniMesejo, thank you for teh reply. I am only trying to find the min value and max value of third column based on first and second column. For 'state' column 'AL', the min value is 167 and max value is 10592, which need to be displayed. – DeepMeh Dec 03 '20 at 17:13
  • 3
    `df.groupby(['State','Energy'])['Elec'].agg(['min','max'])`? – Quang Hoang Dec 03 '20 at 17:14
  • Does this answer your question? [Pandas dataframe max and min value](https://stackoverflow.com/questions/42612676/pandas-dataframe-max-and-min-value) – Joe Ferndz Dec 03 '20 at 17:14
  • So the second column does not matter – Dani Mesejo Dec 03 '20 at 17:16
  • It doesn't, but I need it to be printed because I need it for further plotting purpose – DeepMeh Dec 03 '20 at 17:19
  • @JoeFerndz , it sort of did, but instead of printing the 'max; and 'min' next to every row, I want it to be filtered within the dataframe itself. This way, my number of rows would be reduced too. – DeepMeh Dec 03 '20 at 17:20
  • @QuangHoang Thanks for the reply, but it isn't helping – DeepMeh Dec 03 '20 at 17:30
  • 1
    This link helped me: https://stackoverflow.com/questions/40568438/python-pandas-dataframe-find-max-for-each-unique-values-of-an-another-column – DeepMeh Dec 03 '20 at 17:41

1 Answers1

1

try idxmin and idxmax with .loc filter.

    new_df = my_df.loc[
        my_df.groupby(["State"])
        .agg(ElecMin=("Elec", "idxmin"), ElecMax=("Elec", "idxmax"))
        .stack()
    ]
)

print(new_df)

  State Energy   Elec
0    AK     Ot     15
1    AK   Petr  86848
2    AL     Co    167
3    AL     Ot  10592
Umar.H
  • 22,559
  • 7
  • 39
  • 74
  • 1
    you can alternatively do `my_df.loc[my_df.groupby("State")['Elec'].agg(['idxmin', 'idxmax']).stack()]` ;) – Shubham Sharma Dec 03 '20 at 17:44
  • 1
    @ShubhamSharma thanks :) yeah that makes more sense and looks cleaner but i want to spread more awareness of the `.agg` with `NamedTuples` they are pretty awesome so i try to use them more in examples – Umar.H Dec 03 '20 at 17:57