Suppose I have the following data:
import pandas as pd
df = pd.DataFrame([
['01', 'A'],
['01', 'B'],
['01', 'C'],
['02', 'A'],
['02', 'B'],
['03', 'B'],
['03', 'C']
], columns=['id', 'category'])
How do I create a frequency matrix like this?
A B C
A 2 2 1
B 2 3 2
C 1 2 2
One way to do it is through self join:
result = df.merge(df, on='id')
pd.pivot_table(
result,
index='category_x',
columns='category_y',
values='id',
aggfunc='count'
)
But this will make the data size very large, is there any efficient way to do it without using self join?
Edit
My original post was closed for duplication of pivot_table
. But pivot_table
only accept different columns
and index
. In my case, I have only one category
column. So
# Does not work
pivot_table(df, column='category', index='category', ...)
does not work.