Drop interval of rows in a dataframe based on the value of an object

Question

I am trying to drop intervals of rows in my Dataframes from the maximal value (exclusive) to the rest (end) of the column. Here is an example of one of the column of my df (dflist['time']):

0     0.000000
1     0.021528
2     0.042135
3     0.062925
4     0.083498
        ...   
88    1.796302
89    1.816918
90    1.837118
91    1.857405
92    1.878976
Name: time, Length: 93, dtype: float64

I have tried to use the .iloc and the .drop function in conjunction to the .index to achieve this result but without any success so far:

for nested_dict in dict_all_raw.values():
    for dflist in nested_dict.values():
        v_max = dflist['velocity'].max()
        v_max_idx = dflist['velocity'].index[dflist['velocity'] == v_max]
        dflist['time'] = dflist['time'].iloc[0:[v_max_idx]]

I have also tried several variations, like converting 'v_max_idx' to a list with .list or a .int to change the type inside the .iloc function as it seems to be the problem:

TypeError: cannot do positional indexing on RangeIndex with these indexers [[Int64Index([15], dtype='int64')]] of type list

I don't know why I am not able to do this and it is quiet frustrating, as it seems to be a pretty basic operation..

Any help would therefore be greatly appreciated !

##EDIT REGARDING THE dropna() PROBLEM

I tried with .notna() :

for nested_dict in dict_all_raw.values():
    for dflist in nested_dict.values():
        v_max = dflist['velocity'].max()
        v_max_idx = dflist['velocity'].index[dflist['velocity'] == v_max]
        dflist['velocity'] = dflist['velocity'].iloc[0:list(v_max_idx)[0]]
        dflist['velocity'] = dflist['velocity'][dflist['velocity'].notna()]
        dflist['time'] = dflist['time'].iloc[0:list(v_max_idx)[0]]
        dflist['time'] = dflist['time'][dflist['time'].notna()]

and try with dropna():

for nested_dict in dict_all_raw.values():
    for dflist in nested_dict.values():
        v_max = dflist['velocity'].max()
        v_max_idx = dflist['velocity'].index[dflist['velocity'] == v_max]
        dflist['velocity'] = dflist['velocity'].iloc[0:list(v_max_idx)[0]].dropna()
        dflist['time'] = dflist['time'].iloc[0:list(v_max_idx)[0]].dropna()

No error messages, it just doesn't do anything:

19  0.385243  1.272031
20  0.405416  1.329072
21  0.425477  1.352059
22  0.445642  1.349657
23  0.465755  1.378407
24       NaN       NaN
25       NaN       NaN
26       NaN       NaN
27       NaN       NaN
28       NaN       NaN
29       NaN       NaN
30       NaN       NaN
31       NaN       NaN
32       NaN       NaN
33       NaN       NaN
34       NaN       NaN
35       NaN       NaN
36       NaN       NaN

score 0 · Accepted Answer · answered Apr 07 '22 at 13:36

0

Return value of pandas.Index() in your example is pandas.Int64Index().

pandas.DataFrame.iloc() allows inputs like a slice object with ints, e.g. 1:7.

In your code, no matter v_max_idx which a pandas.Index() object or [pandas.Index()] which is a list object doesn't meet the requirements of iloc() argument type.

You can use list(v_max_idx) to convert pandas.Index() object to list then use [0] etc. to access the data, like

dflist['time'] = dflist['time'].iloc[0:list(v_max_idx)[0]]

answered Apr 07 '22 at 13:36

Ynjxsjmh

28,441
6
34
52

Ok I understand why then, thank you for your answer it worked! Any idea on how to drop the resulting NaN values in my df? can't get ride of them with a .dropna().. – Clément Chéry Apr 07 '22 at 13:59
@ClémentChéry Check https://stackoverflow.com/questions/13413590/how-to-drop-rows-of-pandas-dataframe-whose-value-in-a-certain-column-is-nan, notice that dropna doesn't happen inplace by default. – Ynjxsjmh Apr 07 '22 at 14:08
hmm wether I use .notna() or .dropna() it doesn't work that's strange .. – Clément Chéry Apr 07 '22 at 14:29
@ClémentChéry Can you include a picture of how you do that? – Ynjxsjmh Apr 07 '22 at 14:35
sure, just did an edit to the post for you to see the code – Clément Chéry Apr 07 '22 at 14:50
@ClémentChéry Are you sure there is nan value in your `time` column? – Ynjxsjmh Apr 07 '22 at 15:00
Yes unfortunately, just made a copy past of the last rows. – Clément Chéry Apr 07 '22 at 15:04
@ClémentChéry How about dropna before the loop? – Ynjxsjmh Apr 07 '22 at 15:11
There shouldn't be any NaN before the loop, as it is a results from selecting the rows with the iloc function – Clément Chéry Apr 07 '22 at 15:15
@ClémentChéry Then how about replace `iloc` with `truncate`? – Ynjxsjmh Apr 07 '22 at 15:16
Got an error from python "TypeError: 'method' object is not subscriptable". But I am not sure I called the function right .. : dflist['time'] = dflist['time'].truncate[0:list(v_max_idx)[0]] – Clément Chéry Apr 07 '22 at 15:21
@ClémentChéry Read the document: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.truncate.html – Ynjxsjmh Apr 07 '22 at 15:24
still have the NaN with : dflist['velocity'] = dflist['velocity'].truncate(after=list(v_max_idx)[0], copy=True) . I think I called it right this time – Clément Chéry Apr 07 '22 at 15:37
@ClémentChéry Try `dflist = dflist[~dflist['time'].isna()]` – Ynjxsjmh Apr 07 '22 at 15:40
Didn't work, but I think I found where the problem comes from. Initially the Dataframes are chunks (subsets of a bigger df), obtained as followed : df_list_20 = [ chunk[~chunk.separators][['time', 'velocity']].reset_index(drop=True) for _, chunk in df_20_initial.groupby(df_20_initial.separators.cumsum()) ]. That might block subsequent attenpts to slice the dfs .. – Clément Chéry Apr 07 '22 at 15:45

Drop interval of rows in a dataframe based on the value of an object

1 Answers1