Finding the first appearence of a row with respect to a feature in pandas dataframe

Question

The title may not be super clear. What I want to do is the following.

I have the following dataframe:

df = pd.DataFrame(
    {
        "id": ["1", "2", "3", "1", "4", "5", "2", "6", "3", "1", "4"],
        "value": ["A", "A", "B", "B", "B", "C", "C", "A", "A", "D", "A"],
    },
    index=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
)

Using this data frame I'd like to create a new data frame with the rows those appear for the first time the with respect to the column "id". So, it would mean the rows with the indices: 0,1,2,4,5 and 7.

I hope the problem is expressed clear enough. Thanks.

Similar: [pandas group by and find first non null value for all columns](https://stackoverflow.com/q/59048308/15497888) — Henry Ecker, Jul 05 '21 at 18:35

score 3 · Accepted Answer · answered Jul 05 '21 at 18:32

3

If you want to retain the indices as you mention

You can do a reverse of series.duplicated on id by using a ~ and then a boolean masking:

df[~df['id'].duplicated()]

answered Jul 05 '21 at 18:32

anky

74,114
11
41
70

score 2 · Answer 2 · answered Jul 05 '21 at 18:30

2

Try:

print(df.groupby("id", as_index=False).first())

Prints:

answered Jul 05 '21 at 18:30

Andrej Kesely

168,389
15
48
91

Finding the first appearence of a row with respect to a feature in pandas dataframe

2 Answers2