python: combine rows in df based on id column

Question

i want to merge rows together based on the ID because i have duplicate rows, but there are some columns with missing data please see my examples below for clarity.

below is what my current df looks like:

id|name|snack|treat|age|year|
---|---|---|---|---|---|
ab1|maria|pretzel| |20| |
cd2|jordan| | | |1998
ef3|canela| |cookie|||
ab1|maria||brownie|20|2012
ef3|canela|chips| ||
cd2|jordan||||1998
cd2|jordan|goldfish||25|

this is what i want the combined data to look like:

id|name|snack|treat|age|year|
---|---|---|---|---|---|
ab1|maria|pretzel| brownie|20|2012 |
cd2|jordan|goldfish||25|1998
ef3|canela|chips|cookie|||

how can i get these to merge based on their id?

thanks in advance!!

Does this answer your question? [Pandas | merge rows with same id](https://stackoverflow.com/questions/58218387/pandas-merge-rows-with-same-id) — , Aug 08 '23 at 03:55

tax evader · Answer 1 · 2023-08-08T04:52:31.233

Not the cleanest approach but if you know that the column value is either string or float, you can write a customized aggregate function to pick the last non-nan value between the merged rows

import pandas as pd
import math


def agg_f(x):
    ret = []
    for i in x:
        if (isinstance(i, str) and (i.strip() != '')) or (isinstance(i, float) and not math.isnan(i)):
            ret.append(i)
    if len(ret) != 0:
        # return last merged value
        return ret[-1]
    else:
        return float('nan')
    return ret

df = df.groupby('id').agg({'name': agg_f,
                           'snack': agg_f,
                           'treat': agg_f,
                           'age': agg_f,
                           'year': agg_f})

python: combine rows in df based on id column

1 Answers1