Appending multiple rows into a single row in pandas

Question

This is a followup from the following question.

This is my dataframe:

nan = ""
d = {'NAME': ['a','a','b','b','c','c','c','c','c','d','d','d','d','d','d'],
 'col1': ['P100','P100','P100','P100','MS','MS','MS','MS','MS','MS','MS','MS','MS','MS','MS'],
 'col2': ['CNMZ',
  'CNMZ',
  'COMX',
  'COMX',
  '_NCTE',
  '_NCTE',
  '_NCTE',
  '_NCTE',
  '_NCTE',
  'T1MF',
  'T1MF',
  'T1MF',
  'T1MF',
  'T1MF',
  'T1MF'],
 'stepNo': [1, 2, 1, 2, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 6],
 'col4': ['xyz',
  'abc',
  'pqr',
  'gvt',
  'mno',
  'tru',
  'ercm',
  'lotr',
  'ddlj',
  'refv',
  'ecv',
  'ecv',
  'ecv',
  'ecv',
  'ecv'],
 'col5': ['PHL',
  'PHL',
  'BHL',
  'ALT',
  'MRS',
  'MRS',
  'TUL',
  'MRS',
  'FAT',
  'PHL',
  'PHL',
  'JEN',
  'FTW',
  'AMB',
  'KGP'],
 'col6': ['CP',
  'CO',
  'CP',
  'CO',
  'CP',
  'CO',
  'CO',
  'CO',
  'RT',
  'CO',
  'CO',
  'CO',
  'CP',
  'CO',
  'CO'],
 'col7': ['PHL',
  'PHL',
  'ALT',
  'ALT',
  'MRS',
  'TUL',
  'MRS',
  'FAT',
  'FAH',
  'PHL',
  'JEN',
  'FTW',
  'AMB',
  'KGP',
  'KGP'],
 'col8': ['CO',
  'CO',
  'CO',
  'CO',
  'CO',
  'CO',
  'CO',
  'RT',
  'CP',
  'CO',
  'CO',
  'CP',
  'CO',
  'CO',
  'CO'],
 'col9': ['SID',
  'M/M',
  'SID',
  'U/D',
  'AL LO',
  'AL LO',
  'AL LO',
  'AL LO',
  'AL LO',
  'M/M',
  'DCS',
  'DCS',
  'DCS',
  'DCS',
  'DCS'],
 'col10': ['SID',
  'M/M',
  'SID',
  'U/D',
  'AL LO',
  '3 M',
  '3 M',
  'M/M',
  'AL LO',
  'M/M',
  'DCS',
  'DCS',
  'DCS',
  'DCS',
  'DCS'],
 'col11': [nan,
  'ATM',
  nan,
  'PACK',
  'AL LP',
  'DCS',
  'DCS',
  'DAM',
  'DAM',
  'DCS',
  'DCS',
  'DCS',
  'DCS',
  'DCS',
  'M/M'],
 'col12': [nan,
  'SID',
  nan,
  'PACK',
  'CAL LO',
  'DCS',
  'DCS',
  'M/M',
  'CAL LO',
  'DCS',
  'DCS',
  'DCS',
  'DCS',
  'DCS',
  'AL LO'],
 'col13': ['abc',
  '-02-1_',
  '-1',
  '-13_',
  nan,
  nan,
  nan,
  'T1_VT1.',
  nan,
  '-06',
  nan,
  nan,
  nan,
  nan,
  '-03_02-03'],
 'col14': [nan,
  nan,
  nan,
  nan,
  '102/',
  '102/',
  '102/',
  nan,
  '101/',
  nan,
  '3405',
  '3102/',
  '3111/',
  '3102/',
  nan]}
df = pd.DataFrame(d)

I want to print row with same value in NAME in a single row with all columns using the stepNo. For eg.

Output:

NAME   col1    col2   stepNo  col4...........col14    NAME col1   col2   stepNo   col4...........col14   NAME   col1    col2   stepNo  col4...........col14    NAME col1   col2   stepNo   col4...........col14
 a     P100    CNMZ     1      xyz            nan       a  P100   CNMZ     2       abc            nan
 b     P100    COMX     1      pqr            nan       b  P100   COMX     2       gvt            nan
 c      MS    _NCTE     1      mno            102/      c   MS   _NCTE     2       tru            102/   c      MS    _NCTE     3      ercm           102/     c    MS   _NCTE     4       lotr           nan

Since NAME=3 has 5 row there will be 1 more sets of columns in the output.

For NAME=4 there will be 6 sets of columns in the output.

I hope the example output is descriptive enough. If not you can refer the linked question at the beginning for better understanding.

This was the suggested solution but it fails for the above example:

dfs = []
for i in range(min(df['stepNo']), max(df['stepNo'])+1):
    dfs.append(df[df['stepNo']==i].reset_index())
dfx = pd.concat(dfs, axis=1)
dfx.drop(inplace=True, columns=['index'])

Is it possible to get the expected output?

Marco Cerliani · Accepted Answer · 2020-05-08T09:39:16.843

1

map_name = dict( enumerate(df['NAME'].factorize()[1] ) )
map_name = dict(map(reversed, map_name.items()))
df2 = np.zeros((df.NAME.nunique(), df.stepNo.max() * len(df.columns))).astype(object)

for i,g in df.groupby(['NAME','stepNo']):

    df2[map_name[i[0]], (i[1]-1)*len(df.columns):(i[1])*len(df.columns)] = g.values

df2 = pd.DataFrame(df2)
df2.columns = np.tile(df.columns, df.stepNo.max())
df2['stepNo'] = df2['stepNo'].astype(int)

edited May 08 '20 at 09:39

answered May 08 '20 at 08:56

Marco Cerliani

21,233
3
49
54

`NAME` column is a string value, I have updated dataframe to reflect that . I am getting type error for str cant be converted to int – Murtaza Haji May 08 '20 at 09:07
Still getting the same error. its on line 3 of your code. `TypeError: 'str' object cannot be interpreted as an integer` – Murtaza Haji May 08 '20 at 13:06
stepNo is int or str? There are difference among the df you report here and your real df maybe – Marco Cerliani May 08 '20 at 13:08
stepNo is int in the dataframe. – Murtaza Haji May 08 '20 at 13:10
ok perfect... have you tried to run the code only on the df you reported here? – Marco Cerliani May 08 '20 at 13:12
Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/213417/discussion-between-murtaza-haji-and-marco-cerliani). – Murtaza Haji May 08 '20 at 13:22

Appending multiple rows into a single row in pandas

1 Answers1

Linked