3

I'm still a novice with python and I'm having problems trying to group some data to show that record that has the highest (maximum) date, the dataframe is as follows:

enter image description here ... enter image description here

I am trying the following:

df_2 = df.max(axis = 0) 
df_2 = df.periodo.max()
df_2 = df.loc[df.groupby('periodo').periodo.idxmax()]

And it gives me back:

Timestamp('2020-06-01 00:00:00')

periodo    2020-06-01 00:00:00
valor                  3.49136 

Although the value for 'periodo' is correct, for 'valor' it is not, since I need to obtain the corresponding complete record ('period' and 'value'), and not the maximum of each one. I have tried other ways but I can't get to what I want ...

I need to do?

Thank you in advance, I will be attentive to your answers!

Regards!

Gonza
  • 155
  • 2
  • 10

2 Answers2

3
# import packages we need, seed random number generator
import pandas as pd
import datetime
import random
random.seed(1)

Create example dataframe

dates = [single_date for single_date in (start_date + datetime.timedelta(n) for n in range(day_count))]
values = [random.randint(1,1000) for _ in dates]
df = pd.DataFrame(zip(dates,values),columns=['dates','values'])

ie df will be:

    dates    values
0   2020-01-01  389
1   2020-01-02  808
2   2020-01-03  215
3   2020-01-04  97
4   2020-01-05  500
5   2020-01-06  30
6   2020-01-07  915
7   2020-01-08  856
8   2020-01-09  400
9   2020-01-10  444

Select rows with highest entry in each column

You can do:

df[df['dates'] == df['dates'].max()]

(Or, if wanna use idxmax, can do: df.loc[[df['dates'].idxmax()]])

Returning:

    dates   values
9   2020-01-10  444

ie this is the row with the latest date

&

df[df['values'] == df['values'].max()]

(Or, if wanna use idxmax again, can do: df.loc[[df['values'].idxmax()]] - as in Scott Boston's answer.)

and

    dates   values
6   2020-01-07  915

ie this is the row with the highest value in the values column.

Reference.

zabop
  • 6,750
  • 3
  • 39
  • 84
  • hello !, thank you very much for your help and your time !!, indeed it is as you say :) well !!, I will keep it in mind for the future, thank you very much for your help !! Greetings ;) – Gonza Aug 01 '20 at 15:25
2

I think you need something like:

df.loc[[df['valor'].idxmax()]]

Where you use idxmax on the 'valor' column. Then use that index to select that row.

MVCE:

import pandas as pd
import numpy as np

np.random.seed(123)
df = pd.DataFrame({'periodo':pd.date_range('2018-07-01', periods = 600, freq='d'), 
                  'valor':np.random.random(600)+3})

df.loc[[df['valor'].idxmax()]]

Output:

       periodo     valor
474 2019-10-18  3.998918
Scott Boston
  • 147,308
  • 15
  • 139
  • 187
  • 1
    Super!! Thank you very much for your explanation, now I have it clearer !! thanks for your help and time I appreciate it :) regards !! – Gonza Aug 01 '20 at 15:27