0

data: https://github.com/zero-jack/data/blob/main/hy_data.csv#L7

Goal

  • get the idxmax from last n rows for each group.

Try

df=df.assign(
        l6d_highest_date=lambda x: x.groupby('hy_code')['high'].transform(lambda x: x.rolling(6).idxmax())


AttributeError: 'Rolling' object has no attribute 'idxmax'

notice: week_date is the index.

Jack
  • 1,724
  • 4
  • 18
  • 33

3 Answers3

3

My solution is based on the conversion of the argmax computed on each sliding-window. For each date, thanks to this information, you can infer the date the argmax refers to.

df = pd.read_csv(
    "https://raw.githubusercontent.com/zero-jack/data/main/hy_data.csv",
    sep=",", index_col="week_date"
)


def rolling_idmax(series, n):
    
    #fist compute the index in the sliding window
    ids = series.rolling(n).apply(np.argmax)
    #0 <= ids <= n-1
    
    #how many rows have past from the sliding window maximum?
    ids = n-1-ids
    #0 <= ids <= n-1
    
    #subtract `ids` from the actual positions
    ids = np.arange(len(series))-ids
    #0 <= ids <= len(series)-1
    
    #convert the positions stored in `ids` with the corrisponding dates (series.index)
    ids.loc[~ids.isna()] = series.index[ids.dropna().astype(int)]
    #"2005-06-10" <= ids <= "2022-03-04"

    return ids


df["l6d_highest_date"] = df.groupby("hy_code").high.apply(rolling_idmax, 6)
2

Based on this answer, I get the following workaround. Note that the linked answer can only handle series with the default index, I add x.index[global_index] to deal with non-default index.

window_size = 6

def get_idxmax_in_rolling(x: pd.Series):
    local_index = x.rolling(window_size).apply(np.argmax)[window_size-1:].astype(int)  # local index, removed nan before astype()
    global_index = local_index + np.arange(len(x)-window_size+1)
    # return list(x.index[global_index]) + [np.nan]*(window_size-1)
    return [np.nan]*(window_size-1) + list(x.index[global_index])  # add nan back

df = df.assign(l6d_highest_date=lambda x: x.groupby('hy_code')['high'].transform(get_idxmax_in_rolling))
hellohawaii
  • 3,074
  • 6
  • 21
2

You can apply idxmax (for older versions of pandas before 1.0.0 you need to pass raw=False). The only caveat is that rolling must return a float (see linked docs), not a Timestamp. That's why you need to temporaryly reset the index, get the idxmax values and the corresponding week_dates and reset the index:

import pandas as pd

df = pd.read_csv('https://raw.githubusercontent.com/zero-jack/data/main/hy_data.csv', index_col='week_date', parse_dates=True)

df = df.reset_index()
df['l6d_highest_date'] = df.groupby('hy_code')['high'].transform(lambda x: x.rolling(6).apply(pd.Series.idxmax))
df.loc[df.l6d_highest_date.notna(), 'l6d_highest_date'] = df.loc[df.loc[df.l6d_highest_date.notna(), 'l6d_highest_date'].values, 'week_date'].values
df = df.set_index('week_date')
Stef
  • 28,728
  • 2
  • 24
  • 52