I have a dataframe with repetitive names of people
Input:
{'name': {0: 'John Smith', 1: 'John Smith', 2: 'John Smith', 3: 'John Doo', 4: 'John Doo', 5: 'John Doo'}, 'journal': {0: 'Journal1', 1: 'Journal2', 2: 'Journal2', 3: 'Journal1', 4: 'Journal2', 5: 'Journal2'}, 'is_editor_in_chief_2019': {0: nan, 1: nan, 2: nan, 3: nan, 4: nan, 5: nan}, 'is_editor_in_chief_2020': {0: nan, 1: nan, 2: nan, 3: nan, 4: nan, 5: nan}, 'is_editor_in_chief_2021': {0: nan, 1: nan, 2: nan, 3: nan, 4: nan, 5: nan}, 'is_editor_in_chief_2022': {0: 1.0, 1: nan, 2: nan, 3: 1.0, 4: nan, 5: nan}, 'is_editorial_board_member_2019': {0: nan, 1: nan, 2: nan, 3: nan, 4: nan, 5: nan}, 'is_editorial_board_member_2020': {0: nan, 1: nan, 2: nan, 3: nan, 4: nan, 5: nan}, 'is_editorial_board_member_2021': {0: nan, 1: nan, 2: 1.0, 3: nan, 4: nan, 5: 1.0}, 'is_editorial_board_member_2022': {0: nan, 1: 1.0, 2: nan, 3: nan, 4: 1.0, 5: nan}}
I want to collapse these rows and convert data from long to wide.
Expected output:
{'name': {0: 'John Smith', 1: 'John Doo'}, 'Journal1_is_editor_in_chief_2019': {0: nan, 1: nan}, 'Journal1_is_editor_in_chief_2020': {0: nan, 1: nan}, 'Journal1_is_editor_in_chief_2021': {0: nan, 1: nan}, 'Journal1_is_editor_in_chief_2022': {0: 1, 1: 1}, 'Journal1_is_editorial_board_member_2019': {0: nan, 1: nan}, 'Journal1_is_editorial_board_member_2020': {0: nan, 1: nan}, 'Journal1_is_editorial_board_member_2021': {0: 1, 1: 1}, 'Journal1_is_editorial_board_member_2022': {0: 1, 1: 1}, 'Journal2_is_editor_in_chief_2019': {0: nan, 1: nan}, 'Journal2_is_editor_in_chief_2020': {0: nan, 1: nan}, 'Journal2_is_editor_in_chief_2021': {0: nan, 1: nan}, 'Journal2_is_editor_in_chief_2022': {0: nan, 1: nan}, 'Journal2_is_editorial_board_member_2019': {0: nan, 1: nan}, 'Journal2_is_editorial_board_member_2020': {0: nan, 1: nan}, 'Journal2_is_editorial_board_member_2021': {0: nan, 1: nan}, 'Journal2_is_editorial_board_member_2022': {0: nan, 1: nan}}
The closest solution I found is here:
df_out = df.set_index(['name', df.groupby(['name']).cumcount()+1]).unstack().sort_index(level=1, axis=1)
df_out.columns = df_out.columns.map('{0[0]}_{0[1]}'.format)
df_out.reset_index()
But it does not give me the ideal output.