How to merge all rows in a pandas data frame with the same value for a specific column?

Question

I have a data frame like:

person_ID, first_name, last_name, feature_1, feature_2, feature_3
1, John, Talbert, 1,2,3
2, Ted, Thompson, 4,5,6
1, John, Talbert, 7,8,9
2, Ted, Thompson, 13,14,15

And I'd like to re-format it like:

person_ID, first_name, last_name, feature_1_A, feature_2_A, feature_3_A, feature_1_B, feature_2_B, feature_3_B, 
1, John, Talbert, 1,2,3, 7,8,9
2, Ted, Thompson, 4,5,6,13,14,15

Any idea what would be an efficient way to do this? The dataset is small so it doesn't really have to be super efficient.

Thank you in advance for your help.

Quang Hoang · Accepted Answer · 2021-02-18T02:23:07.407

2

This is essentially pivot and rename:

df['col'] = df.groupby('person_ID').cumcount()

out=df.pivot_table(index=['person_ID','first_name','last_name'], 
                   columns='col', aggfunc='first')
out.columns = [f'{x}_{y}' for x,y in out.columns]
out = out.reset_index()

Output:

   person_ID first_name last_name  feature_1_0  feature_1_1  feature_2_0  feature_2_1  feature_3_0  feature_3_1
0          1       John   Talbert            1            7            2            8            3            9
1          2        Ted  Thompson            4           13            5           14            6           15

edited Feb 18 '21 at 02:23

answered Feb 18 '21 at 02:08

Quang Hoang

146,074
10
56
74

Hi, what's your panda's version? I, for some reason, am unable to run the pivot method on multiple index values. pandas = '1.0.5' – Akshay Sehgal Feb 18 '21 at 02:22
1

@AkshaySehgal From `1.1.0`, list of index names is accepted by pivot. Updated the answer with `pivot_table`. Also one can use `set_index().unstack()` instead of `pivot`. – Quang Hoang Feb 18 '21 at 02:23
Also, there seems to be some issue with `[f'{x}_{y}' for x,y in out.columns]`. – Akshay Sehgal Feb 18 '21 at 02:25
@AkshaySehgal That's f-string, available from Python 3.6+. Otherwise, use `.format` :-) – Quang Hoang Feb 18 '21 at 02:27
Yea i understand fstring and i am running 3.7. however the out.columns is not unpacking to x,y. Maybe changed with updated pandas? – Akshay Sehgal Feb 18 '21 at 02:29
Upgrading pandas, will check it out – Akshay Sehgal Feb 18 '21 at 02:29
yea, updates with the new pandas. with `pivot_table` in pandas versions before `1.1.0`, the resultant table doesnt have multi-index columns. – Akshay Sehgal Feb 18 '21 at 02:32
So, the second line of code won't work, even after changing to `pivot_table` – Akshay Sehgal Feb 18 '21 at 02:32
@AkshaySehgal I don't know, this runs fine on my system. This based on Q/A #11 of [this guide](https://stackoverflow.com/questions/47152691/how-to-pivot-a-dataframe) which predates `1.1.0`. – Quang Hoang Feb 18 '21 at 02:37
strange, ill try downgrading and sharing the trace. – Akshay Sehgal Feb 18 '21 at 02:39

score 2 · Answer 2 · answered Feb 18 '21 at 03:00

You can also do it like this:

idx = df.groupby(['person_ID','first_name', 'last_name']).cumcount().map({0:'A',1:'B'})

df_out = df.set_index(['person_ID', 'first_name', 'last_name', idx]).unstack().sort_index(level=1, axis=1)

df_out.columns = [f'{i}_{j}' for i, j in df_out.columns]
df_out.reset_ind

ex()

Output:

   person_ID first_name last_name  feature_1_A  feature_2_A  feature_3_A  feature_1_B  feature_2_B  feature_3_B
0          1       John   Talbert            1            2            3            7            8            9
1          2        Ted  Thompson            4            5            6           13           14           15

Akshay Sehgal · Answer 3 · 2021-02-18T03:14:04.357

As suggested by @Quang Hoang in comments, here is the set_index().unstack() approach.

df['col'] = df.groupby('person_ID').cumcount()
out = df.set_index(['person_ID', 'first_name', 'last_name','col']).unstack()
out.columns = [f'{x}_{y}' for x,y in out.columns]
out = out.reset_index()
print(out)

   person_ID  first_name  last_name   feature_1_0   feature_1_1   feature_2_0  \
0          1        John    Talbert             1             7             2   
1          2         Ted   Thompson             4            13             5   

    feature_2_1   feature_3_0   feature_3_1  
0             8             3             9  
1            14             6            15

yea just fixed that. thanks a ton. already getting a hang of the `set_index().unstack()`, alternative to pivots (which i dearly hate). — Akshay Sehgal, Feb 18 '21 at 03:14

How to merge all rows in a pandas data frame with the same value for a specific column?

3 Answers3