I have a pandas dataframe like this
df
document term
X a
X b
X a
X c
Y a
Y c
Y d
I want to create sparse matrix like this: This sparse matrix has rows as unique documents, columns as unique terms. I want to fill 1 if document and term co-exists in original dataframe irrespective number of times they co-existed, else 0
a b c d
X 1 1 1 0
Y 1 0 1 1
I have tried with for loop, it is time consuming with million rows.
My Answer after suggestion from piRsquared:
#drop duplicates
df = df.drop_duplicates()
df.pivot_table(index='document', columns='term', fill_value=0, aggfunc='size')