1

I have the following dataframe df (just a sample with 6 rows):

df = pd.DataFrame({'#': [1, 1, 2, 3, 3, 3],
                   'Name': ['john', 'john', 'max', 'tim', 'tim', 'tim'],
                   'Phone': ['01234', '98765', '', '', '33445566', ''],
                   'ID': ['', '', '11111', '2222222', '', ''],
                   'User': ['', '', '', '', '', 'tim123']})

print(df)

      #  Name     Phone       ID       User
0     1  john     01234                 
1     1  john     98765                 
2     2   max                 11111        
3     3   tim                 2222222        
4     3   tim     33445566                 
5     3   tim                          tim123

Now I want to merge all entries like # 3 (tim). My desired dataframe should look like this:

   #  Name     Phone       ID        User
0  1  john     01234              
1  1  john     98765              
2  2   max                 11111     
3  3   tim     33445566    2222222   tim123  

So I want to combine the columns 'Phone', 'ID' and 'User' for the rows with the same #- and name-column in a dataframe. Do you have any useful suggestions, how I could do this?

Thanks a lot in advance!

hiasomat
  • 13
  • 2

1 Answers1

0

Use justify function per group and then remove only empty strings rows:

#https://stackoverflow.com/a/44559180/2901002
def justify(a, invalid_val=0, axis=1, side='left'):    
    """
    Justifies a 2D array

    Parameters
    ----------
    A : ndarray
        Input array to be justified
    axis : int
        Axis along which justification is to be made
    side : str
        Direction of justification. It could be 'left', 'right', 'up', 'down'
        It should be 'left' or 'right' for axis=1 and 'up' or 'down' for axis=0.

    """

    if invalid_val is np.nan:
        mask = pd.notna(a)
    else:
        mask = a!=invalid_val
    justified_mask = np.sort(mask,axis=axis)
    if (side=='up') | (side=='left'):
        justified_mask = np.flip(justified_mask,axis=axis)
    out = np.full(a.shape, invalid_val, dtype=object) 
    if axis==1:
        out[justified_mask] = a[mask]
    else:
        out.T[justified_mask.T] = a.T[mask.T]
    return out

f = lambda x: pd.DataFrame(justify(x.values,invalid_val='', side='up', axis=0),
                           columns=df.columns[2:])
df = df.set_index(['#','Name']).groupby(level=[0,1]).apply(f)
df = df[df.ne('').any(axis=1)].reset_index(level=2, drop=True).reset_index()
print (df)
   #  Name     Phone       ID    User
0  1  john     01234                 
1  1  john     98765                 
2  2   max              11111        
3  3   tim  33445566  2222222  tim123
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252