2

I have this data set for example:

   Name  Number Is true
0  Dani       2     yes
1  Dani       2      no
2  Jack       5      no
3  Jack       5   maybe
4  Dani       2   maybe

I want to create a new data set that combines similar rows and adds columns by column different values. This is the output I'm trying to get:

   Name  Number Is true1 Is true2 Is true3
0  Dani       2      yes       no    maybe
1  Jack       5       no    maybe  

I couldn't get it working from example 10 here: How to pivot a dataframe

Would you be able to provide a specific example for this use case please?

Thanks.

Edit for respond:

   Name      yes     no     maybe
0  Dani        2      2         2
1  Jack      NaN      5         5
Guy
  • 23
  • 3
  • use : `final=(df.assign(k=df.groupby(['Name','Number']).cumcount()) .set_index(['Name','Number','k']).unstack())` and then `final.columns=[f'{a}_{b}' for a,b in final.columns]` , finally print `final.reset_index()` – anky Dec 25 '19 at 08:45

2 Answers2

0

You can try this:

df2 = df.drop_duplicates(subset=['Name', 'Number Is'])
df2 = df2.reset_index(drop=True).assign(true= df.groupby('Number Is')['true'].agg(list).reset_index(drop=True) )
temp = df2['true'].apply(pd.Series).T 
temp.index = temp.index+1
temp = temp.T
df2 = df2.assign(**temp.add_prefix('true').add_suffix(' Is')).drop(columns='true').fillna('')  

output:

   Name  Number Is true1 Is true2 Is true3 Is
0  Dani          2      yes       no    maybe
1  Jack          5       no    maybe      
oppressionslayer
  • 6,942
  • 2
  • 7
  • 24
0

With combination of pivot_table(...) and apply(...):

df.pivot_table(index=["Name", "Number"], values="Is true", aggfunc=list).apply(lambda x: pd.Series({f"Is true{id+1}": el for id, el in enumerate(x[0])}), axis=1).reset_index()

Output:

   Name  Number Is true1 Is true2 Is true3
0  Dani       2      yes       no    maybe
1  Jack       5       no    maybe      NaN

Edit

For your follow up. This might be something along the lines, what you're looking for:

df.pivot_table(index=["Name"], columns="Is true", values="Number", aggfunc=list).fillna('').apply(lambda x: pd.Series({f"{col}{id+1}": el for col in x.keys() for id, el in enumerate(x[col])}), axis=1).reset_index()

Output:

   Name  maybe1  no1  yes1
0  Dani     2.0  2.0   2.0
1  Jack     5.0  5.0   NaN
Grzegorz Skibinski
  • 12,624
  • 2
  • 11
  • 34
  • 1
    Thank you it works! If possible I have a follow up question please. If I would like to set new column names by specific column, but to take the values from other column, how would I do that? Please see an example under "Edit for respond:" in my original question. Thanks again :) – Guy Dec 25 '19 at 21:04
  • So add columns, which will be dependent on both ```is true``` and ```number``` ? It depends on what kind of dependency exactly you want to have? – Grzegorz Skibinski Dec 25 '19 at 21:04
  • Hm gotcha, so you want max ```number``` for every ```name``` (rows), for every ```is true``` column? Let me think about that. – Grzegorz Skibinski Dec 25 '19 at 21:08
  • Hm, but how is ```Jack``` ```yes```=2? – Grzegorz Skibinski Dec 25 '19 at 21:13
  • Sorry, added it by mistake, fixed in the original question under "Edit for respond:" – Guy Dec 26 '19 at 18:09
  • If to be more clear, I have specific column with few specific values that I want those values to become new column names ("Is True"). Under this new columns, the values should be taken from another column with many possible values ("Number"). And I want to group by all the rest of the columns I have ("Name" in the example above). I appreciate you help! – Guy Dec 26 '19 at 18:13
  • Hm, understood, then it's either as per my edit, if you want to take"max" from number for values, or ```aggfunc=list``` (instead of ```aggfunc="max"``` in the paragraph after edit) if you want to get all possible ```number``` – Grzegorz Skibinski Dec 26 '19 at 18:37
  • I think this example almost did the job: df.pivot_table(index=["Name", "Number"], values="Is true", aggfunc=list).apply(lambda x: pd.Series({f"Is true{id+1}": el for id, el in enumerate(x[0])}), axis=1).reset_index() I will change it from" values="Is true" to " values="Number" and I just want to understand how to take the new column names from "Is True" column, specifically I mean how to change this part to support it: .apply(lambda x: pd.Series({f"Is true{id+1}": el for id, el in enumerate(x[0])}), axis=1).reset_index() – Guy Dec 26 '19 at 20:50
  • I edited my edit - I think now I understood what you wanted to do there. Let me know, if now it's correct. – Grzegorz Skibinski Dec 26 '19 at 22:57