I have 2 dataframes - "df_rollmax" is a derivative of "df_data" with the same shape. I am attempting to map the values of df_rollmax back to df_data and create a third df (df_maxdates) which contains the dates at which each value in df_rollmax originally showed up in df_data.
list1 = [[21,101],[22,110],[25,113],[24,112],[21,109],[28,108],[30,102],[26,106],[25,111],[24,110]]
df_data = pd.DataFrame(list1,index=pd.date_range('2000-1-1',periods=10, freq='D'), columns=list('AB'))
df_rollmax = pd.DataFrame(df_data.rolling(center=False,window=5).max())
mapA = pd.Series(df_data.index, index=df_data['A'])
From a previous question, I see that a single date can be found with:
mapA[rollmax.ix['j','A']]
returns Timestamp('2000-01-07 00:00:00')
But my real dataset is much larger and I would like to fill the third dataframe with dates without looping over every row and column.
Mapping back to the indices is a problem due to: ValueError: cannot reindex from a duplicate axis
so this isn't working...
df_maxdates = pd.DataFrame(index=df_data.index, columns=df_data.columns)
for s in df_data.columns:
df_maxdates[s] = mapA.loc[df_rollmax[s]]
Using the last instance of the duplicate value would be fine, but df.duplicated(keep='last')
isn't cooperating.
Greatly appreciate any and all wisdom.
Update - this is what df_maxdates would look like: