Pandas: get the first occurrence grouping by keys

Question

If I have following dataframe

| id | timestamp           | code | id2
| 10 | 2017-07-12 13:37:00 | 206  | a1
| 10 | 2017-07-12 13:40:00 | 206  | a1
| 10 | 2017-07-12 13:55:00 | 206  | a1
| 10 | 2017-07-12 19:00:00 | 206  | a2
| 11 | 2017-07-12 13:37:00 | 206  | a1
...

I need to group by id, id2 columns and get the first occurrence of timestamp value, e.g. for id=10, id2=a1, timestamp=2017-07-12 13:37:00.

I googled it and found some possible solutions, but cant figure out how to realize them properly. This probably should be something like:

df.groupby(["id", "id2"])["timestamp"].apply(lambda x: ....)

jezrael · Accepted Answer · 2017-07-12T12:47:01.293

I think you need GroupBy.first:

df.groupby(["id", "id2"])["timestamp"].first()

Or drop_duplicates:

df.drop_duplicates(subset=['id','id2'])

For same output:

df1 = df.groupby(["id", "id2"], as_index=False)["timestamp"].first()
print (df1)
   id id2            timestamp
0  10  a1  2017-07-12 13:37:00
1  10  a2  2017-07-12 19:00:00
2  11  a1  2017-07-12 13:37:00

df1 = df.drop_duplicates(subset=['id','id2'])[['id','id2','timestamp']]
print (df1)
   id id2            timestamp
0  10  a1  2017-07-12 13:37:00
1  10  a2  2017-07-12 19:00:00
2  11  a1  2017-07-12 13:37:00

score 0 · Answer 2 · answered Jan 06 '18 at 14:50

One can create a new column after merging id and id2 strings, then remove rows where it is duplicated:

df['newcol'] = df.apply(lambda x: str(x.id) + str(x.id2), axis=1)
df = df[~df.newcol.duplicated()].iloc[:,:4]   # iloc used to remove new column.
print(df)

Output:

   id              timestamp  code  id2
0  10   2017-07-12 13:37:00    206   a1
3  10   2017-07-12 19:00:00    206   a2
4  11   2017-07-12 13:37:00    206   a1

Pandas: get the first occurrence grouping by keys

2 Answers2

Linked

Related