Problem with group by max period in dataframe pandas

Question

I'm still a novice with python and I'm having problems trying to group some data to show that record that has the highest (maximum) date, the dataframe is as follows:

...

I am trying the following:

df_2 = df.max(axis = 0) 
df_2 = df.periodo.max()
df_2 = df.loc[df.groupby('periodo').periodo.idxmax()]

And it gives me back:

Timestamp('2020-06-01 00:00:00')

periodo    2020-06-01 00:00:00
valor                  3.49136

Although the value for 'periodo' is correct, for 'valor' it is not, since I need to obtain the corresponding complete record ('period' and 'value'), and not the maximum of each one. I have tried other ways but I can't get to what I want ...

I need to do?

Thank you in advance, I will be attentive to your answers!

Regards!

zabop · Accepted Answer · 2020-08-01T15:15:48.623

# import packages we need, seed random number generator
import pandas as pd
import datetime
import random
random.seed(1)

Create example dataframe

dates = [single_date for single_date in (start_date + datetime.timedelta(n) for n in range(day_count))]
values = [random.randint(1,1000) for _ in dates]
df = pd.DataFrame(zip(dates,values),columns=['dates','values'])

ie df will be:

    dates    values
0   2020-01-01  389
1   2020-01-02  808
2   2020-01-03  215
3   2020-01-04  97
4   2020-01-05  500
5   2020-01-06  30
6   2020-01-07  915
7   2020-01-08  856
8   2020-01-09  400
9   2020-01-10  444

Select rows with highest entry in each column

You can do:

df[df['dates'] == df['dates'].max()]

(Or, if wanna use idxmax, can do: df.loc[[df['dates'].idxmax()]])

Returning:

    dates   values
9   2020-01-10  444

ie this is the row with the latest date

&

df[df['values'] == df['values'].max()]

(Or, if wanna use idxmax again, can do: df.loc[[df['values'].idxmax()]] - as in Scott Boston's answer.)

and

    dates   values
6   2020-01-07  915

ie this is the row with the highest value in the values column.

Reference.

hello !, thank you very much for your help and your time !!, indeed it is as you say :) well !!, I will keep it in mind for the future, thank you very much for your help !! Greetings ;) — Gonza, Aug 01 '20 at 15:25

score 2 · Answer 2 · answered Aug 01 '20 at 15:07

2

I think you need something like:

df.loc[[df['valor'].idxmax()]]

Where you use idxmax on the 'valor' column. Then use that index to select that row.

MVCE:

import pandas as pd
import numpy as np

np.random.seed(123)
df = pd.DataFrame({'periodo':pd.date_range('2018-07-01', periods = 600, freq='d'), 
                  'valor':np.random.random(600)+3})

df.loc[[df['valor'].idxmax()]]

Output:

       periodo     valor
474 2019-10-18  3.998918

answered Aug 01 '20 at 15:07

Scott Boston

147,308
15
139
187

1

Super!! Thank you very much for your explanation, now I have it clearer !! thanks for your help and time I appreciate it :) regards !! – Gonza Aug 01 '20 at 15:27

Problem with group by max period in dataframe pandas

2 Answers2

Create example dataframe

Select rows with highest entry in each column