1

I have a data frame which has everyday for a month, and for each day the times every 10 minutes:

        Date     Time   Temp 
0   31/05/2006  09:00   9.3
1   31/05/2006  09:10   10.1
2   31/05/2006  09:20   10.7

I am trying to get the time (hh:mm) for the Max(Temp), so I used the function argmax for calculating the index of the Max(Temp)

maxTime = data.iloc[data[data['Date'] == '31/05/2006']['Outside Temperature'].argmax()]['Time']

That's fine, but now I need to calculate this for each day of the month, so I put this inside a loop. First I created the list MaxTempTime for saving the results of my loop:

MaxTempTime = []
for i in data['Date']:
    maxTime = data.iloc[data[data['Date'] == i ]['Outside Temperature'].argmax()]['Time']
    MaxTempTime.extend(maxTime)
    print maxTime

But I get the answer as many times as there are per day, I just need it only once and then go on to the next date

(having periods of 10 min, there are 144 periods of 10 minutes within the 1440 minutes per day, so I get 144 of the same answer per each day)

Can anybody help me sort out this please? Thanks!

Paul Rooney
  • 20,879
  • 9
  • 40
  • 61
Horacio Nesman
  • 111
  • 1
  • 10

4 Answers4

1

You could just add the following slight modification to your initial attempt:

MaxTempTime = []
for i in data['Date'].unique():
    maxTime = data.iloc[data[data['Date'] == i ]['Outside Temperature'].argmax()]['Time']
    MaxTempTime.append(maxTime)

That way, you iterate over all days in your DataFrame, but each one only once. This gets the job done without much change needed in your code, although there might be faster ways using groupby(), which could be a concern if your DataFrame is big.

As a sidenote, you should use append() instead of extend() to add elements to a list. Using extend() in this case splits the time-string into single characters and appends every one as it's own element. See here for an explanation of the difference between the two methods.

Toterich
  • 585
  • 2
  • 7
0

I'd guess it has something to do with you taking the max across an entire array, and thus you getting an entire array full of max values and then adding it onto your list. I would try either doing append instead of extend, or since they're all the same value you could just set maxTime = maxTime[0]

Cary Shindell
  • 1,336
  • 8
  • 25
0

You can use groupby() by month and day.

Your data is in df.

>>> df
         Date  Temp   Time
0  31/05/2006   9.3  09:00
1  31/05/2006  10.1  09:10
2  31/05/2006  10.7  09:20
3  31/05/2006  10.5  09:30
4  31/05/2006  10.9  09:40
5  01/06/2006   9.0  09:00
6  01/06/2006   9.3  09:10
7  01/06/2006   9.2  09:20
8  01/06/2006   9.7  09:30
9  01/06/2006   9.5  09:40
  1. Create month and day column.

    >>> df2 = df.assign(Date = pd.to_datetime(df.Date, dayfirst=True))
    >>> df2 = df2.assign(mon = df2.Date.dt.month, day = df2.Date.dt.day)
    
  2. groupby() by month and day, get the indices of max Temp.

    >>> df2.groupby(['mon', 'day'])['Temp'].idxmax()
    
    mon  day
    5    31     4
    6    1      8
    Name: Temp, dtype: int64
    
  3. Select the indices from df2

    >>> df2.loc[df2.groupby(['mon', 'day'])['Temp'].idxmax()]
    
            Date  Temp   Time  day  mon
    4 2006-05-31  10.9  09:40   31    5
    8 2006-06-01   9.7  09:30    1    6
    

Keep other columns when using min() with groupby

what_alnk
  • 81
  • 3
0

I think you need groupby with idxmax for indices of max Temp per group and then select original df by loc:

df['Date'] = pd.to_datetime(df['Date'], dayfirst=True)
df = df.loc[df.groupby('Date')['Temp'].idxmax()]
print (df)
        Date  Temp   Time
4 2006-05-31  10.9  09:40
8 2006-06-01   9.7  09:30

Alternative solution with sort_values, groupby with aggregate last:

df['Date'] = pd.to_datetime(df['Date'], dayfirst=True)
df = df.sort_values('Temp').groupby('Date', as_index=False).last()
print (df)
        Date  Temp   Time
0 2006-05-31  10.9  09:40
1 2006-06-01   9.7  09:30
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252