With the following DataFrame:
import pandas as pd
import random
random.seed(3)
df = pd.DataFrame(
data=[random.sample(["A","B"],1) for i in range(6)],
columns=["category"] )
We get:
How do I get only the first row for each consecutive category group?
Note: the data can contain an arbitrary number of repeats - I only want the first of each consecutive group.
Expected would be:
category
0 A
2 B
4 A
I hoped that the sort
flag from groupby()
would solve this, but it nevertheless treats all occurences of category as a group - not consecutive ones:
df.groupby("category").head(1)
As I am learning pandas and my DataFrame
can become very large I'm searching for a pandas native solution and not iterating over the array or DataFrame
.
While the answers from Make Pandas groupby act similarly to itertools groupby can be applied here, the posed question is different. As such I would leave this question open so it's easier to find an answer.