1

This is how my df looks like.

    A      B     C      D      E      F
    xyz   abc    aa    100    qq     brc,pqr,lmn
    xyz   abc    bb    150    qq     lmn,brc,ppq
    xyz   abc    cc    80     qq     lmn,pqr
    abc   pqr    cc    99     qq     pqr,brc,lmn
    abc   pqr    aa    180    qq     brc,lmn,pqr
    abc   pqr    bb    200    qq     lmn,pqr,brc

this is what i want. how can i achieve the same?

A      B     aa     bb    cc      E      F
xyz   abc    100   150    80      qq     brc,pqr,lmn,ppq
abc   pqr    180   200    99      qq     brc,lmn,pqr
Prachi
  • 75
  • 3

1 Answers1

1

Use DataFrame.pivot_table with DataFrame.join for another DataFrame created by GroupBy.agg:

f = lambda x: ','.join(set(z for y in x for z in y.split(',')))
df1 = df.groupby(['A','B']).agg({'E':'first', 'F': f})

df = df.pivot_table(index=['A','B'],
                    columns='C',
                    values='D',
                    aggfunc='mean').join(df1).reset_index()
print (df)
     A    B   aa   bb  cc   E                F
0  abc  pqr  180  200  99  qq      lmn,brc,pqr
1  xyz  abc  100  150  80  qq  lmn,ppq,brc,pqr
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Thank you for responding but I am getting the error as 'No numeric types to aggregate' as the data type is string for values. what can i do about it? – Prachi Aug 31 '20 at 11:26
  • 1
    @Prachi - You can use `df.D = df.D.astype('int')` or some another solumn from [this](https://stackoverflow.com/questions/15891038/change-data-type-of-columns-in-pandas) like `pd.to_numeric` – jezrael Aug 31 '20 at 11:27
  • ok, i did that. but my output doesnt look like how i want it. i actually got the column 'cc' twice and the values seperated too. i tried df.groupby(df.columns,axis=1).first() but it didnt work too. – Prachi Sep 01 '20 at 13:29
  • 1
    @Prachi - There are not some space like columns are `'cc'` and `' cc'` ? What is `print (df.columns.tolist())` after my solution? – jezrael Sep 02 '20 at 05:05
  • After your solution these are the columns i get `['A', 'B', ' cc', 'aa', 'bb', 'cc', 'E', 'F']` – Prachi Sep 02 '20 at 07:19
  • 1
    @Prachi - Thank you, so problem is `cc` has space, so need before my solution `df.C = df.C.str.strip()` for convert `' cc'` to `'cc'` – jezrael Sep 02 '20 at 07:20