-1

I have a pandas dataframe with few thousand rows and only one column. The structure of the content is as follows:

  |   0
0 | Score 1
1 | Date 1
2 | Group 1
3 | Score 1
4 | Score 2
5 | Date 2
6 | Group 2
7 | Score 2
8 | Score 3
9 | Date 3
10| Group 3
11| ...
12| ...
13| Score (n-1)
14| Score n
15| Date n
16| Group n

I need to delete all rows with index i if "Score" in row(i) and "Score" in row(i+1). Any suggestion on how to achieve this?

The expected output is as follows:

  |   0
0 | Score 1
1 | Date 1
2 | Group 1
3 | Score 2
4 | Date 2
5 | Group 2
6 | Score 3
7 | Date 3
8 | Group 3
9 | ...
10| ...
11| Score n
12| Date n
13| Group n
Ch3steR
  • 20,090
  • 4
  • 28
  • 58
  • Please provide a small set of sample data as text that we can copy and paste. Include the corresponding desired result. Check out the guide on [how to make good reproducible pandas examples](https://stackoverflow.com/a/20159305/3620003). – timgeb Jun 06 '20 at 18:10
  • What's the expected output? – Ch3steR Jun 06 '20 at 18:10
  • 1
    Why do you have this in a 1-dimensional vector? It seems it would be a natural to have this in 3 column data frame with an index... – AirSquid Jun 06 '20 at 18:13
  • From what I understood, you are dropping duplicates, try [`pd.Series.drop_duplicates`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.drop_duplicates.html) and `reset_index(drop=True)` – Ch3steR Jun 06 '20 at 18:17
  • @Ch3steR this will drop all duplicates, not just those occurring next to each other. – mac13k Jun 06 '20 at 18:29
  • What specifically is the issue? – AMC Jun 06 '20 at 18:38
  • @Ch3ster thanks for your input, my explanation was not that clear, sorry about it. I was not intending to delete duplicates, because some scores could be the same even if they are not subsequent. – Juanito_86Alg Jun 06 '20 at 18:42

1 Answers1

1

I need to delete all rows with index i if "Score" in row(i) and "Score" in row(i+1). Any suggestion on how to achieve this?

Given

>>> df
         0
0  Score 1
1   Date 1
2  Group 1
3  Score 1
4  Score 2
5   Date 2
6  Group 2
7  Score 2
8  Score 3
9   Date 3

you can use

>>> mask = df.assign(shift=df[0].shift(-1)).apply(lambda s: s.str.contains('Score')).all(1)
>>> df[~mask].reset_index(drop=True)
         0
0  Score 1
1   Date 1
2  Group 1
3  Score 2
4   Date 2
5  Group 2
6  Score 3
7   Date 3

Although if I were you I would use fix the format of the data first as the commenters already pointed out.

timgeb
  • 76,762
  • 20
  • 123
  • 145