1

I have a Dataframe that looks as follows:


Serial : Status   : Date     : Comment
---------------------------------------
306    :  in      : 4/25/22  : Reported 
306    :  out     : 3/15/22  : Reported
306    :  damaged : 2/1/22   : Reported
307    :  in      : 7/6/22   : 
309    :          :          : 
310    :  out     : 6/5/22   : Not Reported
311    :  out     : 5/5/22   : Not Reported
311    :  in      : 4/5/22   : 

I would like to reshape it to look as follows:

Serial : Status   : Date     : Comment        : Status_1   : Date_1     : Comment_1. : Status_2   : Date_2     : Comment_2
-----------------------------------------------------------------------------------------------------------------------
306    :  in      : 4/25/22  : Reported       :  out       : 3/15/22    : Reported   :  damaged   : 2/1/22     : Reported
307    :  in      : 7/6/22   : 
309    :          :          : 
310    :  out     : 6/5/22   : Not Reported
311    :  out     : 5/5/22   : Not Reported   :  in        : 4/5/22     : 

I have tried using Transpose and Pivot tables, but am unable to get the desired output. I have also tried iterating over the Dataframe by row and column and assigning values to new columns but that hasn't worked as expected either. How can I achieve the desired output ?

zeroz
  • 121
  • 10

1 Answers1

1
import pandas as pd
import numpy as np
import itertools
df = pd.DataFrame({
    'Serial':[306,306,306,307,309,310,311,311],
    'Status':['in','out','damage','in',np.nan,'out','out','in'],
    'Date':['4/25/22','3/15/22','2/1/22','7/6/22',np.nan,'6/5/22','5/5/22','4/5/22'],
    'Comment':['Reported','Reported','Reported',np.nan,np.nan,'Not Reported','Not Reported',np.nan]
})
df
###
   Serial  Status     Date       Comment
0     306      in  4/25/22      Reported
1     306     out  3/15/22      Reported
2     306  damage   2/1/22      Reported
3     307      in   7/6/22           NaN
4     309     NaN      NaN           NaN
5     310     out   6/5/22  Not Reported
6     311     out   5/5/22  Not Reported
7     311      in   4/5/22           NaN
df['group_count'] = df.groupby('Serial').cumcount()
temp = df.pivot(index=['Serial'],  columns=['group_count'])
temp.columns = temp.columns.droplevel(0)
max_count = df.groupby('Serial').count().values.max()
output = temp.loc[:,[i for i in range(max_count)]]

# Rename columns
name_list = [[f"Status_{i}",f"Date_{i}",f"Comment_{i}"] for i in range(3)]
name_list = list(itertools.chain.from_iterable(name_list))

output.columns = name_list
output

enter image description here




If you wanna fill NA

output.fillna('', inplace=True)
output

enter image description here

Baron Legendre
  • 2,053
  • 3
  • 5
  • 22