Say i have the following Dataframe
d = {'name_col': ['name1', 'name2', 'name1'], 'tag_col': ['tag1', 'tag2', 'tag3'], 'time': ['08:17', '9:20', '08:18']}
df = pd.DataFrame(data=d)
+----------+---------+-------+
| name_col | tag_col | time |
+----------+---------+-------+
| name1 | tag1 | 08:17 |
+----------+---------+-------+
| name2 | tag2 | 9:20 |
+----------+---------+-------+
| name1 | tag3 | 08:18 |
+----------+---------+-------+
I want to aggregate name_col
by tag_col
, i.e. by using the below I can get the output below
df_final.groupby('name_col ')['tag_col '].agg(';'.join).reset_index(name='tag_col ')
+----------+------------+
| name_col | tag_col |
+----------+------------+
| name1 | tag1; tag3 |
+----------+------------+
| name2 | tag2 |
+----------+------------+
However, I need to keep the time
, but I can't aggregate because the value of that column can change under the same name_col
. In that case, I would like to just take the first value of time
, and have an output like
+----------+----------+------------+
| name_col | tag_col | time |
+----------+----------+------------+
| name1 | tag1;tag3| 08:17 |
+----------+----------+------------+
| name2 | tag2 | 9:20 |
+----------+----------+------------+