How to merge similar rows and split column into rows by values?

Question

I have this data set for example:

   Name  Number Is true
0  Dani       2     yes
1  Dani       2      no
2  Jack       5      no
3  Jack       5   maybe
4  Dani       2   maybe

I want to create a new data set that combines similar rows and adds columns by column different values. This is the output I'm trying to get:

   Name  Number Is true1 Is true2 Is true3
0  Dani       2      yes       no    maybe
1  Jack       5       no    maybe

I couldn't get it working from example 10 here: How to pivot a dataframe

Would you be able to provide a specific example for this use case please?

Thanks.

Edit for respond:

   Name      yes     no     maybe
0  Dani        2      2         2
1  Jack      NaN      5         5

use : `final=(df.assign(k=df.groupby(['Name','Number']).cumcount()) .set_index(['Name','Number','k']).unstack())` and then `final.columns=[f'{a}_{b}' for a,b in final.columns]` , finally print `final.reset_index()` — anky, Dec 25 '19 at 08:45

score 0 · Answer 1 · answered Dec 25 '19 at 08:55

You can try this:

df2 = df.drop_duplicates(subset=['Name', 'Number Is'])
df2 = df2.reset_index(drop=True).assign(true= df.groupby('Number Is')['true'].agg(list).reset_index(drop=True) )
temp = df2['true'].apply(pd.Series).T 
temp.index = temp.index+1
temp = temp.T
df2 = df2.assign(**temp.add_prefix('true').add_suffix(' Is')).drop(columns='true').fillna('')

output:

   Name  Number Is true1 Is true2 Is true3 Is
0  Dani          2      yes       no    maybe
1  Jack          5       no    maybe

Grzegorz Skibinski · Accepted Answer · 2019-12-26T22:56:02.977

0

With combination of pivot_table(...) and apply(...):

df.pivot_table(index=["Name", "Number"], values="Is true", aggfunc=list).apply(lambda x: pd.Series({f"Is true{id+1}": el for id, el in enumerate(x[0])}), axis=1).reset_index()

Output:

   Name  Number Is true1 Is true2 Is true3
0  Dani       2      yes       no    maybe
1  Jack       5       no    maybe      NaN

Edit

For your follow up. This might be something along the lines, what you're looking for:

df.pivot_table(index=["Name"], columns="Is true", values="Number", aggfunc=list).fillna('').apply(lambda x: pd.Series({f"{col}{id+1}": el for col in x.keys() for id, el in enumerate(x[col])}), axis=1).reset_index()

Output:

   Name  maybe1  no1  yes1
0  Dani     2.0  2.0   2.0
1  Jack     5.0  5.0   NaN

edited Dec 26 '19 at 22:56

answered Dec 25 '19 at 10:10

Grzegorz Skibinski

12,624
2
11
34

1

Thank you it works! If possible I have a follow up question please. If I would like to set new column names by specific column, but to take the values from other column, how would I do that? Please see an example under "Edit for respond:" in my original question. Thanks again :) – Guy Dec 25 '19 at 21:04
So add columns, which will be dependent on both ```is true``` and ```number``` ? It depends on what kind of dependency exactly you want to have? – Grzegorz Skibinski Dec 25 '19 at 21:04
Hm gotcha, so you want max ```number``` for every ```name``` (rows), for every ```is true``` column? Let me think about that. – Grzegorz Skibinski Dec 25 '19 at 21:08
Hm, but how is ```Jack``` ```yes```=2? – Grzegorz Skibinski Dec 25 '19 at 21:13
Sorry, added it by mistake, fixed in the original question under "Edit for respond:" – Guy Dec 26 '19 at 18:09
If to be more clear, I have specific column with few specific values that I want those values to become new column names ("Is True"). Under this new columns, the values should be taken from another column with many possible values ("Number"). And I want to group by all the rest of the columns I have ("Name" in the example above). I appreciate you help! – Guy Dec 26 '19 at 18:13
Hm, understood, then it's either as per my edit, if you want to take"max" from number for values, or ```aggfunc=list``` (instead of ```aggfunc="max"``` in the paragraph after edit) if you want to get all possible ```number``` – Grzegorz Skibinski Dec 26 '19 at 18:37
I think this example almost did the job: df.pivot_table(index=["Name", "Number"], values="Is true", aggfunc=list).apply(lambda x: pd.Series({f"Is true{id+1}": el for id, el in enumerate(x[0])}), axis=1).reset_index() I will change it from" values="Is true" to " values="Number" and I just want to understand how to take the new column names from "Is True" column, specifically I mean how to change this part to support it: .apply(lambda x: pd.Series({f"Is true{id+1}": el for id, el in enumerate(x[0])}), axis=1).reset_index() – Guy Dec 26 '19 at 20:50
I edited my edit - I think now I understood what you wanted to do there. Let me know, if now it's correct. – Grzegorz Skibinski Dec 26 '19 at 22:57

How to merge similar rows and split column into rows by values?

2 Answers2