Pandas select rows after first non NaN value for each groupby group

Question

I want to select rows after the first non NaN values for each group ["etf_ticker", "ticker"] in my dataset:

trade_date,etf_ticker,ticker,buy_sell,quantity,weighting,open,low,high,close,volume,spy_close,qqq_close,10YUSD_close
2021-02-05,ARKF,1833.HK,,,,112.5,102.5,112.5,104.900002,9714617.0,386.444305,330.942017,1.17
2021-02-09,ARKF,1833.HK,,,,103.900002,101.900002,105.900002,104.5,5574818.0,388.976013,333.089325,1.157
2021-02-10,ARKF,1833.HK,Buy,497800.0,0.1854,107.800003,104.0,108.0,105.699997,3336335.0,388.806549,332.330261,1.133
2021-02-11,ARKF,1833.HK,Buy,2098800.0,0.7583,127.0,127.0,127.0,127.0,0.0,389.434509,334.157928,1.158
2021-02-18,ARKF,1833.HK,,,,139.0,130.600006,140.600006,134.600006,9742566.0,389.444489,332.050629,1.287
2021-03-11,ARKF,1833.HK,Buy,965000.0,0.2978,95.949997,95.0,98.849998,98.400002,6335100.0,392.2453,317.638824,1.527
2021-03-29,ARKF,1833.HK,,,,94.099998,93.900002,100.599998,99.050003,5850980.0,394.730011,314.320007,1.726
2021-03-31,ARKF,1833.HK,,,,97.800003,97.599998,103.800003,103.300003,6723838.0,400.609985,324.570007,1.679
2021-03-26,ARKF,AAPL,,,,120.349998,118.919998,121.480003,121.209999,93958900.0,395.980011,316.0,1.66
2021-03-28,ARKF,AAPL,,,,121.650002,120.730003,122.580002,121.389999,80819200.0,395.779999,315.910004,1.721
2021-03-29,ARKF,AAPL,Sell,77439.0,0.245,120.110001,118.860001,120.400002,119.900002,85671900.0,394.730011,314.320007,1.726
2021-03-30,ARKF,AAPL,,,,121.650002,121.150002,123.519997,122.150002,118323800.0,396.329987,319.130005,1.746

the expected result is:

trade_date,etf_ticker,ticker,buy_sell,quantity,weighting,open,low,high,close,volume,spy_close,qqq_close,10YUSD_close
2021-02-10,ARKF,1833.HK,Buy,497800.0,0.1854,107.800003,104.0,108.0,105.699997,3336335.0,388.806549,332.330261,1.133
2021-02-11,ARKF,1833.HK,Buy,2098800.0,0.7583,127.0,127.0,127.0,127.0,0.0,389.434509,334.157928,1.158
2021-02-18,ARKF,1833.HK,,,,139.0,130.600006,140.600006,134.600006,9742566.0,389.444489,332.050629,1.287
2021-03-11,ARKF,1833.HK,Buy,965000.0,0.2978,95.949997,95.0,98.849998,98.400002,6335100.0,392.2453,317.638824,1.527
2021-03-29,ARKF,1833.HK,,,,94.099998,93.900002,100.599998,99.050003,5850980.0,394.730011,314.320007,1.726
2021-03-31,ARKF,1833.HK,,,,97.800003,97.599998,103.800003,103.300003,6723838.0,400.609985,324.570007,1.679
2021-03-29,ARKF,AAPL,Sell,77439.0,0.245,120.110001,118.860001,120.400002,119.900002,85671900.0,394.730011,314.320007,1.726
2021-03-30,ARKF,AAPL,,,,121.650002,121.150002,123.519997,122.150002,118323800.0,396.329987,319.130005,1.746

I have checked a few example but not sure how to apply them in a groupbylast_valid_index

I could loop through the dataset for each group and create a new dataframe but was wondering if there was a better way. tx

Does this answer your question? [pandas group by and find first non null value for all columns](https://stackoverflow.com/questions/59048308/pandas-group-by-and-find-first-non-null-value-for-all-columns) — Joe Ferndz, Apr 09 '21 at 00:11
tx Joe, I haven't managed to make it work using first(): yahoo_df = yahoo_df.iloc[yahoo_df.groupby(["etf_ticker", "ticker", ], as_index=False)["buy_sell"].first().index]. — Je Je, Apr 09 '21 at 00:30

score 1 · Accepted Answer · answered Apr 09 '21 at 00:16

1

Try with groupby + transform and idxmax

out = df[df.index>=df['buy_sell'].notnull().groupby([df['etf_ticker'],df['ticker']]).transform('idxmax')]

answered Apr 09 '21 at 00:16

BENY

317,841
20
164
234

Pandas select rows after first non NaN value for each groupby group

1 Answers1