0

I have a dataframe of 4X3 and want to pivot and then combine to avoid duplicate intersections.

Column A Column B Column C
boo ptype 123
boo tecnh 34e
boo ptype 34w
boo staaa 45r

I have tried and couldn't pivot nor combine.

combined = line.apply(lambda row: ','.join(row.values.astype(str)), axis=1) (reference from stackoverflow)

Is there a way to pivot and combine to get the results as below?

Column A ptype tecnh staaa
boo 123,34w 34e 45r
Venkat
  • 27
  • 5

2 Answers2

0

Use pivot_table

res = df.pivot_table(index='Column_A',
               columns='Column_B',
               values='Column_C',
               aggfunc= lambda x:','.join(x))

res = df.pivot_table(index='Column_A', columns='Column_B',
                     values='Column_C', aggfunc= lambda x:','.join(x))
res.columns = res.columns.values
res.reset_index()

enter image description here

sayan dasgupta
  • 1,084
  • 6
  • 15
  • How would OP get rid of the `Column_B` index level? – AKX Nov 07 '22 at 10:11
  • Your question is not clear. `Column_B` is just a name of the index. If you want to get rid of it, it is a matter of saying `res.columns = res.columns.values` Also you can just check `res.columns` to understand it – sayan dasgupta Nov 07 '22 at 10:16
  • The point is your output has a multi-level index; OP's desired output doesn't. – AKX Nov 07 '22 at 10:57
  • 1
    If you check it is not a multi-level index. The column name index just had a name attribute. Anyway updated the answer to reflect exact output. – sayan dasgupta Nov 07 '22 at 11:03
  • Thanks @sayan dasgupta. Your answer has solved my problem. This is exactly what I want. However, I can do groupby and unstack. – Venkat Nov 07 '22 at 18:01
0

I think you need to do a group by before applying the pivot function. This solves your problem :

# your dataframe
df = pd.DataFrame({'Column A': ['boo', 'boo', 'boo'], 'Column B':['ptype', 'tecnh', 'ptype'], 'Column C' : ['123', '34e', '34w']})
# you need to group by the columns with duplicated rows and aggregate the values column before applying pivot
df = df.groupby(['Column A', 'Column B'])['Column C'].agg(list).reset_index()
# after the groupby, you can apply pivot
df = df.pivot(index='Column A', columns = 'Column B', values = 'Column C').reset_index(drop = True)
koding_buse
  • 161
  • 3