0

I have df with some string values.

so = pd.DataFrame({
"col1": ["row0", "row1", "row2"],
"col2": ["A", "B", "C"],
"col3": ["A", "A", "B"],
"col4":  ["B", "A", "B"],
})

I need to create pivot table where:

  1. index is values from column "col1"
  2. columns are unique values from columns ['col2':'col4']
  3. values at the intersection are count of column name matches for every row

For my example, the answer should be:

enter image description here

Please help... thank you in advance

exceev
  • 15
  • 3

2 Answers2

1

melt and crosstab:

df2 = so.melt('col1')
pd.crosstab(df2['col1'], df2['value'])

or melt and groupby.count:

so.melt('col1').groupby(['col1', 'value']).size().unstack(fill_value=0)

output:

value  A  B  C
col1          
row0   2  1  0
row1   2  1  0
row2   0  2  1

NB. for the exact output, use .reset_index().rename_axis(columns=None)

mozway
  • 194,879
  • 13
  • 39
  • 75
  • melt and groupby seems good for me. But what if i want to use as `index` first two columns? `melt('col1', 'col2')` works as expected, but how i need to write `groupby`? `groupby(['col1', 'col2', 'value'])` doesn't work. I think I'm missing something. – exceev Aug 01 '22 at 16:34
  • There is no more `col2` after `melt`, check the output ;) If you want to use `col2` both as grouper and value you need to duplicate it. – mozway Aug 01 '22 at 16:46
  • I mean, what of I need to save `'col1', 'col2'` as index and count unique values only for columns `'col3','col4'`? Maybe its easer for me to understand with melt&crosstab: `df2 = so.melt(['col1','col2']) ###works as i expect### pd.crosstab(df2[?], df2['value']) ###can't understand###` – exceev Aug 01 '22 at 17:07
  • Maybe you should open a follow up question with all details? – mozway Aug 01 '22 at 17:09
  • Hooraaaay, I did it, thank you very much! :) – exceev Aug 01 '22 at 17:32
1

here is one way to do it

df.melt('col1').pivot_table(index='col1', columns='value', aggfunc=(lambda x: int(x.size))  ).fillna(0).reset_index()
    col1    variable
value       A         B     C
0   row0    2.0     1.0     0.0
1   row1    2.0     1.0     0.0
2   row2    0.0     2.0     1.0
Naveed
  • 11,495
  • 2
  • 14
  • 21
  • Thanks a lot for the solution. It works for me too! Can you please explain to me the mechanics of the lambda in this case? I can't figure out how the calculations happen :( – exceev Aug 01 '22 at 18:08
  • Pivot is resulting in multiple values for row/column (cell) as a list, aggfunc here is a lambda function that takes the size of this list and return the count. Hope it helps – Naveed Aug 01 '22 at 18:42
  • Yep, I get it. Thank you for the explanation! – exceev Aug 02 '22 at 04:41