pandas pivot table with unique values of columns

Question

I have df with some string values.

so = pd.DataFrame({
"col1": ["row0", "row1", "row2"],
"col2": ["A", "B", "C"],
"col3": ["A", "A", "B"],
"col4":  ["B", "A", "B"],
})

I need to create pivot table where:

For my example, the answer should be:

Please help... thank you in advance

score 1 · Accepted Answer · answered Aug 01 '22 at 16:12

1

melt and crosstab:

df2 = so.melt('col1')
pd.crosstab(df2['col1'], df2['value'])

or melt and groupby.count:

so.melt('col1').groupby(['col1', 'value']).size().unstack(fill_value=0)

output:

value  A  B  C
col1          
row0   2  1  0
row1   2  1  0
row2   0  2  1

NB. for the exact output, use .reset_index().rename_axis(columns=None)

answered Aug 01 '22 at 16:12

mozway

melt and groupby seems good for me. But what if i want to use as `index` first two columns? `melt('col1', 'col2')` works as expected, but how i need to write `groupby`? `groupby(['col1', 'col2', 'value'])` doesn't work. I think I'm missing something. – exceev Aug 01 '22 at 16:34
There is no more `col2` after `melt`, check the output ;) If you want to use `col2` both as grouper and value you need to duplicate it. – mozway Aug 01 '22 at 16:46
I mean, what of I need to save `'col1', 'col2'` as index and count unique values only for columns `'col3','col4'`? Maybe its easer for me to understand with melt&crosstab: `df2 = so.melt(['col1','col2']) ###works as i expect### pd.crosstab(df2[?], df2['value']) ###can't understand###` – exceev Aug 01 '22 at 17:07
Maybe you should open a follow up question with all details? – mozway Aug 01 '22 at 17:09
Hooraaaay, I did it, thank you very much! :) – exceev Aug 01 '22 at 17:32

score 1 · Answer 2 · answered Aug 01 '22 at 16:16

1

here is one way to do it

df.melt('col1').pivot_table(index='col1', columns='value', aggfunc=(lambda x: int(x.size))  ).fillna(0).reset_index()

    col1    variable
value       A         B     C
0   row0    2.0     1.0     0.0
1   row1    2.0     1.0     0.0
2   row2    0.0     2.0     1.0

answered Aug 01 '22 at 16:16

Naveed

Thanks a lot for the solution. It works for me too! Can you please explain to me the mechanics of the lambda in this case? I can't figure out how the calculations happen :( – exceev Aug 01 '22 at 18:08
Pivot is resulting in multiple values for row/column (cell) as a list, aggfunc here is a lambda function that takes the size of this list and return the count. Hope it helps – Naveed Aug 01 '22 at 18:42
Yep, I get it. Thank you for the explanation! – exceev Aug 02 '22 at 04:41

2 Answers2