How to delete specific rows in pandas dataframe if a condition is met

Question

I have a pandas dataframe with few thousand rows and only one column. The structure of the content is as follows:

  |   0
0 | Score 1
1 | Date 1
2 | Group 1
3 | Score 1
4 | Score 2
5 | Date 2
6 | Group 2
7 | Score 2
8 | Score 3
9 | Date 3
10| Group 3
11| ...
12| ...
13| Score (n-1)
14| Score n
15| Date n
16| Group n

I need to delete all rows with index i if "Score" in row(i) and "Score" in row(i+1). Any suggestion on how to achieve this?

The expected output is as follows:

  |   0
0 | Score 1
1 | Date 1
2 | Group 1
3 | Score 2
4 | Date 2
5 | Group 2
6 | Score 3
7 | Date 3
8 | Group 3
9 | ...
10| ...
11| Score n
12| Date n
13| Group n

Please provide a small set of sample data as text that we can copy and paste. Include the corresponding desired result. Check out the guide on [how to make good reproducible pandas examples](https://stackoverflow.com/a/20159305/3620003). — timgeb, Jun 06 '20 at 18:10
Why do you have this in a 1-dimensional vector? It seems it would be a natural to have this in 3 column data frame with an index... — AirSquid, Jun 06 '20 at 18:13
From what I understood, you are dropping duplicates, try [`pd.Series.drop_duplicates`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.drop_duplicates.html) and `reset_index(drop=True)` — Ch3steR, Jun 06 '20 at 18:17
@Ch3steR this will drop all duplicates, not just those occurring next to each other. — mac13k, Jun 06 '20 at 18:29
@Ch3ster thanks for your input, my explanation was not that clear, sorry about it. I was not intending to delete duplicates, because some scores could be the same even if they are not subsequent. — Juanito_86Alg, Jun 06 '20 at 18:42

score 1 · Accepted Answer · answered Jun 06 '20 at 18:22

I need to delete all rows with index i if "Score" in row(i) and "Score" in row(i+1). Any suggestion on how to achieve this?

Given

>>> df
         0
0  Score 1
1   Date 1
2  Group 1
3  Score 1
4  Score 2
5   Date 2
6  Group 2
7  Score 2
8  Score 3
9   Date 3

you can use

>>> mask = df.assign(shift=df[0].shift(-1)).apply(lambda s: s.str.contains('Score')).all(1)
>>> df[~mask].reset_index(drop=True)
         0
0  Score 1
1   Date 1
2  Group 1
3  Score 2
4   Date 2
5  Group 2
6  Score 3
7   Date 3

Although if I were you I would use fix the format of the data first as the commenters already pointed out.

How to delete specific rows in pandas dataframe if a condition is met

1 Answers1