0

This is a followup from the following question.

This is my dataframe:

nan = ""
d = {'NAME': ['a','a','b','b','c','c','c','c','c','d','d','d','d','d','d'],
 'col1': ['P100','P100','P100','P100','MS','MS','MS','MS','MS','MS','MS','MS','MS','MS','MS'],
 'col2': ['CNMZ',
  'CNMZ',
  'COMX',
  'COMX',
  '_NCTE',
  '_NCTE',
  '_NCTE',
  '_NCTE',
  '_NCTE',
  'T1MF',
  'T1MF',
  'T1MF',
  'T1MF',
  'T1MF',
  'T1MF'],
 'stepNo': [1, 2, 1, 2, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 6],
 'col4': ['xyz',
  'abc',
  'pqr',
  'gvt',
  'mno',
  'tru',
  'ercm',
  'lotr',
  'ddlj',
  'refv',
  'ecv',
  'ecv',
  'ecv',
  'ecv',
  'ecv'],
 'col5': ['PHL',
  'PHL',
  'BHL',
  'ALT',
  'MRS',
  'MRS',
  'TUL',
  'MRS',
  'FAT',
  'PHL',
  'PHL',
  'JEN',
  'FTW',
  'AMB',
  'KGP'],
 'col6': ['CP',
  'CO',
  'CP',
  'CO',
  'CP',
  'CO',
  'CO',
  'CO',
  'RT',
  'CO',
  'CO',
  'CO',
  'CP',
  'CO',
  'CO'],
 'col7': ['PHL',
  'PHL',
  'ALT',
  'ALT',
  'MRS',
  'TUL',
  'MRS',
  'FAT',
  'FAH',
  'PHL',
  'JEN',
  'FTW',
  'AMB',
  'KGP',
  'KGP'],
 'col8': ['CO',
  'CO',
  'CO',
  'CO',
  'CO',
  'CO',
  'CO',
  'RT',
  'CP',
  'CO',
  'CO',
  'CP',
  'CO',
  'CO',
  'CO'],
 'col9': ['SID',
  'M/M',
  'SID',
  'U/D',
  'AL LO',
  'AL LO',
  'AL LO',
  'AL LO',
  'AL LO',
  'M/M',
  'DCS',
  'DCS',
  'DCS',
  'DCS',
  'DCS'],
 'col10': ['SID',
  'M/M',
  'SID',
  'U/D',
  'AL LO',
  '3 M',
  '3 M',
  'M/M',
  'AL LO',
  'M/M',
  'DCS',
  'DCS',
  'DCS',
  'DCS',
  'DCS'],
 'col11': [nan,
  'ATM',
  nan,
  'PACK',
  'AL LP',
  'DCS',
  'DCS',
  'DAM',
  'DAM',
  'DCS',
  'DCS',
  'DCS',
  'DCS',
  'DCS',
  'M/M'],
 'col12': [nan,
  'SID',
  nan,
  'PACK',
  'CAL LO',
  'DCS',
  'DCS',
  'M/M',
  'CAL LO',
  'DCS',
  'DCS',
  'DCS',
  'DCS',
  'DCS',
  'AL LO'],
 'col13': ['abc',
  '-02-1_',
  '-1',
  '-13_',
  nan,
  nan,
  nan,
  'T1_VT1.',
  nan,
  '-06',
  nan,
  nan,
  nan,
  nan,
  '-03_02-03'],
 'col14': [nan,
  nan,
  nan,
  nan,
  '102/',
  '102/',
  '102/',
  nan,
  '101/',
  nan,
  '3405',
  '3102/',
  '3111/',
  '3102/',
  nan]}
df = pd.DataFrame(d)

I want to print row with same value in NAME in a single row with all columns using the stepNo. For eg.

Output:

NAME   col1    col2   stepNo  col4...........col14    NAME col1   col2   stepNo   col4...........col14   NAME   col1    col2   stepNo  col4...........col14    NAME col1   col2   stepNo   col4...........col14
 a     P100    CNMZ     1      xyz            nan       a  P100   CNMZ     2       abc            nan
 b     P100    COMX     1      pqr            nan       b  P100   COMX     2       gvt            nan
 c      MS    _NCTE     1      mno            102/      c   MS   _NCTE     2       tru            102/   c      MS    _NCTE     3      ercm           102/     c    MS   _NCTE     4       lotr           nan   

Since NAME=3 has 5 row there will be 1 more sets of columns in the output.

For NAME=4 there will be 6 sets of columns in the output.

I hope the example output is descriptive enough. If not you can refer the linked question at the beginning for better understanding.

This was the suggested solution but it fails for the above example:

dfs = []
for i in range(min(df['stepNo']), max(df['stepNo'])+1):
    dfs.append(df[df['stepNo']==i].reset_index())
dfx = pd.concat(dfs, axis=1)
dfx.drop(inplace=True, columns=['index'])

Is it possible to get the expected output?

Murtaza Haji
  • 1,093
  • 1
  • 13
  • 32

1 Answers1

1
map_name = dict( enumerate(df['NAME'].factorize()[1] ) )
map_name = dict(map(reversed, map_name.items()))
df2 = np.zeros((df.NAME.nunique(), df.stepNo.max() * len(df.columns))).astype(object)

for i,g in df.groupby(['NAME','stepNo']):

    df2[map_name[i[0]], (i[1]-1)*len(df.columns):(i[1])*len(df.columns)] = g.values

df2 = pd.DataFrame(df2)
df2.columns = np.tile(df.columns, df.stepNo.max())
df2['stepNo'] = df2['stepNo'].astype(int)
Marco Cerliani
  • 21,233
  • 3
  • 49
  • 54