Les's say my input is:
doc_id label
0 a Apple
1 b Book
2 c Book
3 a Book
4 b Cat
5 a Apple
6 c Book
My data ^^
df = pd.DataFrame({"doc_id": ["a", "b", "c", "a", "b", "a", "c"],
"label": ["Apple", "Book", "Book", "Book", "Cat", "Apple", "Book"]
})
And my desired output is:
label Apple Book Cat
doc_id
a 2.0 1.0 NaN
b NaN 1.0 1.0
c NaN 2.0 NaN
Which I can get with:
df["count"] = np.ones(len(df))
new_df = df.pivot_table(index="doc_id", columns="label", values="count", aggfunc="sum")
But creating an temporary column of counts that are all ones feels redundant, what is the proper way to do this?