assuming we have this table
df = pd.DataFrame({'ID': [1, 2, 1, 4, 2, 6, 1], 'Name': ['led', 'peter', 'james', 'ellie', 'make', 'levi', 'kent'],
'food': ['apples', 'oranges', 'banana', 'carrots', 'carrots', 'mango', 'banana'],
'color': ['red', 'blue', 'pink', 'red', 'red', 'purple', 'orange']})
+----+-------+---------+--------+
| id | name | food | color |
+----+-------+---------+--------+
| 1 | led | apples | red |
| 2 | peter | oranges | blue |
| 1 | james | banana | pink |
| 4 | ellie | carrots | red |
| 2 | mako | carrots | red |
| 6 | levi | mango | purple |
| 1 | kent | banana | orange |
+----+-------+---------+--------+
The goal here is to group by id but keep appending new rows as long as duplicates are found. the output would be like this:
+----+-------+---------+--------+-------+---------+--------+-------+--------+--------+
| id | name | food | color | name2 | food2 | color2 | name3 | food3 | color3 |
+----+-------+---------+--------+-------+---------+--------+-------+--------+--------+
| 1 | led | apples | red | james | banana | pink | kent | banana | orange |
| 2 | peter | oranges | blue | mako | carrots | red | | | |
| 4 | ellie | carrots | red | | | | | | |
| 6 | levi | mango | purple | | | | | | |
+----+-------+---------+--------+-------+---------+--------+-------+--------+--------+
there is an existing logic for this, but it gets messed up when some of the columns on the duplicate is missing.
df = pd.DataFrame({'ID': [1, 2, 1, 4, 2, 6, 1], 'Name': ['led', 'peter', np.nan, np.nan, 'make', 'levi', 'kent'],
'food': [np.nan, 'oranges', 'banana', 'carrots', 'carrots', 'mango', 'banana'],
'color': ['red', 'blue', 'pink', 'red', np.nan, 'purple', 'orange']})
transformed_df = df.set_index('ID').stack().droplevel(1)
counter = transformed_df.groupby('ID').cumcount().to_numpy()
transformed_df.index = [transformed_df, counter]
transformed_df = transformed_df.unstack().add_prefix('Col').reset_index()