0

I have a dataframe (lets call it "best_trades") that is such as follow:

                              exit_time  netgain_by_MAE
2000-02-01 07:00:00 2000-02-01 11:30:00        2.083333
2000-02-01 08:30:00 2000-02-01 11:30:00        2.625000
2000-02-01 09:00:00 2000-02-01 11:30:00        3.312500
2000-02-01 09:30:00 2000-02-01 11:30:00        2.000000
2000-02-01 10:30:00 2000-02-01 11:30:00        2.312500
2000-02-02 13:30:00 2000-02-02 15:00:00        2.000000
2000-02-02 14:00:00 2000-02-02 15:00:00        2.428571
2000-02-02 15:00:00 2000-02-03 00:00:00        3.000000
2000-02-03 02:30:00 2000-02-03 07:30:00        2.000000
2000-02-03 10:00:00 2000-02-03 15:00:00        2.222222
2000-02-03 10:30:00 2000-02-04 00:00:00        3.259259
2000-02-03 11:00:00 2000-02-04 00:00:00        4.666667
2000-02-03 11:00:00 2000-02-04 01:00:00        2.629630
2000-02-03 11:00:00 2000-02-04 01:30:00       14.500000
2000-02-03 12:30:00 2000-02-04 01:30:00       11.600000
2000-02-03 13:00:00 2000-02-04 01:30:00        5.300000
2000-02-03 13:30:00 2000-02-04 01:30:00        2.700000
2000-02-03 14:00:00 2000-02-04 03:30:00        4.100000
2000-02-03 14:30:00 2000-02-04 03:30:00        2.600000

The dataframe represent a list of trades. The first column represent the entry time of the trade, the second the exit time, the 3rd column , a custom metric i created.

I would like to compute the highest value of netgain_by_MAE for trades that are overlapping in time, for instance those ones:

2000-02-01 07:00:00 2000-02-01 11:30:00        2.083333
2000-02-01 08:30:00 2000-02-01 11:30:00        2.625000
2000-02-01 09:00:00 2000-02-01 11:30:00        3.312500
2000-02-01 09:30:00 2000-02-01 11:30:00        2.000000
2000-02-01 10:30:00 2000-02-01 11:30:00        2.312500

(here the result would be 2000-02-01 09:00:00 2000-02-01 11:30:00 3.312500)

Those ones:

2000-02-03 11:00:00 2000-02-04 00:00:00        4.666667
2000-02-03 11:00:00 2000-02-04 01:00:00        2.629630
2000-02-03 11:00:00 2000-02-04 01:30:00       14.500000

or those ones:

2000-02-03 10:00:00 2000-02-03 15:00:00        2.222222
2000-02-03 10:30:00 2000-02-04 00:00:00        3.259259

Any idea which pandas tool i could use to solve that problem? Thank you.

Edit: I managed to progress with the groupby function: best_trades = best_trades.groupby(['MAE_ts'])['netgain_by_MAE'].transform('max')

However this returns a df such as follow:

2000-02-01 07:00:00     2.083333
2000-02-01 08:30:00     3.312500
2000-02-01 09:00:00     3.312500
2000-02-01 09:30:00     3.312500
2000-02-01 10:30:00     3.312500
2000-02-02 13:30:00     2.428571
...

How to keep the same columns as the original dataframe, only keeping the rows that have the max netgain_by_mae value for each groupby ?

jim jarnac
  • 4,804
  • 11
  • 51
  • 88
  • In SQL, you can use analytic functions where you group by data for a specific column and rank them according to another column. This link should get you started if you are interested in SQL : https://oracle-base.com/articles/misc/analytic-functions. In Pandas, check this link: http://stackoverflow.com/questions/17775935/sql-like-window-functions-in-pandas-row-numbering-in-python-pandas-dataframe – Abhi Dec 12 '16 at 00:53
  • @Joe R: can you repost your answer? It was useful. Thx! – jim jarnac Dec 12 '16 at 21:15

0 Answers0