How do I merge duplicate rows into one on a DataFrame when they have different values

Question

I have a DataFrame like the following:

ID  NAME    TEL_1   TEL_2   TEL_3
1   John    123456  754987  465317
1   John    465987          465987
1   John            546783
2   Robert  264687  
2   Robert          462531  
3   William 432645  765346  875137

I need to merge the rows that have the same ID, saving the phone values, like this:

ID  NAME    TEL_1   TEL_2   TEL_3   TEL_4   TEL_5   TEL_6
1   John    123456  754987  465317  465987  465987  546783
2   Robert  264687  462531  
3   William 432645  765346  875137

Why do you have three different columns in the first place ? Seing the result you want, I feel like your first table should only have one 'TEL' column. — Borbag, Jan 05 '16 at 15:26

Stefan · Accepted Answer · 2016-01-06T14:56:17.393

You can set your ID and NAME columns as index, use groupby on these and then concat the respective rows horizontally to get your desired output:

persons = df.set_index(['ID', 'NAME']).groupby(level=['ID', 'NAME'])
new_df =pd.DataFrame()
for details, phones in persons:
    person_phones = pd.concat([row for i, row in phones.iterrows()]).to_frame()
    person_phones.index = ['TEL_{}'.format(i) for i in range(len(person_phones))]
    new_df = pd.concat([new_df, person_phones], axis=1)

new_df.transpose().reset_index().rename(columns={'level_0': 'ID', 'level_1': 'NAME'})

to get:

   ID     NAME   TEL_0   TEL_1   TEL_2   TEL_3   TEL_4   TEL_5  TEL_6   TEL_7  \
0   1     John  123456  754987  465317  465987     NaN  465987    NaN  546783   
1   2   Robert  264687     NaN     NaN     NaN  462531     NaN    NaN     NaN   
2   3  William  432645  765346  875137     NaN     NaN     NaN    NaN     NaN   

   TEL_8  
0    NaN  
1    NaN  
2    NaN

paulo.filip3 · Answer 2 · 2016-01-05T21:36:59.520

You can try:

import pandas as pd
import numpy as np

data = {'ID': {0: 1, 1: 1, 2: 1, 3: 2, 4: 2, 5: 3},
        'NAME': {0: 'John', 1: 'John', 2: 'John', 3: 'Robert',
                 4: 'Robert', 5: 'William'},
       'TEL_1': {0: 123456, 1: 465987, 2: None, 3: 264687, 4: None,
                 5: 432645},
       'TEL_2': {0: 754987, 1: None, 2: 546783, 3: None, 4: 462531,
                 5: 765346},
       'TEL_3': {0: 465317, 1: 465987, 2: None, 3: None, 4: None,
                 5: 875137}}

df = pd.DataFrame(data)
grouped = df.groupby(['ID', 'NAME'])

def merger(group):
    nr_cols = [col for col in group.columns if 'TEL_' in col]
    values = [group[col].values for col in nr_cols]
    new_row = pd.Series()
    i = 1
    for row in values:
        for nr in row:
            if not np.isnan(nr):
                new_row['TEL_{}'.format(i)] = nr
                i += 1
    return new_row

merged = grouped.apply(merger).unstack().reset_index()

The merged data frame will look like this:

ID NAME     TEL_1   TEL_2   TEL_3   TEL_4   TEL_5   TEL_6
1  John     123456  465987  754987  546783  465317  465987
2  Robert   264687  462531     NaN     NaN     NaN     NaN
3  William  432645  765346  875137     NaN     NaN     NaN

How do I merge duplicate rows into one on a DataFrame when they have different values

2 Answers2

Linked