1

Say i have the following Dataframe

d = {'name_col': ['name1', 'name2', 'name1'], 'tag_col': ['tag1', 'tag2', 'tag3'], 'time': ['08:17', '9:20', '08:18']}
df = pd.DataFrame(data=d)

+----------+---------+-------+
| name_col | tag_col | time  |
+----------+---------+-------+
| name1    | tag1    | 08:17 |
+----------+---------+-------+
| name2    | tag2    | 9:20  |
+----------+---------+-------+
| name1    | tag3    | 08:18 |
+----------+---------+-------+

I want to aggregate name_col by tag_col, i.e. by using the below I can get the output below

df_final.groupby('name_col ')['tag_col '].agg(';'.join).reset_index(name='tag_col ')

+----------+------------+
| name_col | tag_col    |
+----------+------------+
| name1    | tag1; tag3 |
+----------+------------+
| name2    | tag2       |
+----------+------------+

However, I need to keep the time, but I can't aggregate because the value of that column can change under the same name_col. In that case, I would like to just take the first value of time, and have an output like

+----------+----------+------------+
| name_col | tag_col  | time       |
+----------+----------+------------+
| name1    | tag1;tag3| 08:17      |
+----------+----------+------------+
| name2    | tag2     | 9:20       |
+----------+----------+------------+
  • 1
    You can try `df.groupby('name_col').agg({'tag_col': ';'.join, 'time': 'first'})` – Shubham Sharma Oct 01 '20 at 13:01
  • 1
    The suggested post doesnt help me at all... Like I said, I can't group everthing using `df.groupby(['name_col','time'])['tag_col'].apply(', '.join).reset_index()`, since time is not the same in both case. Please re-open –  Oct 01 '20 at 13:07

0 Answers0