2

Les's say my input is:

  doc_id  label
0      a  Apple
1      b   Book
2      c   Book
3      a   Book
4      b    Cat
5      a  Apple
6      c   Book

My data ^^

df = pd.DataFrame({"doc_id": ["a", "b", "c", "a", "b", "a", "c"],
                   "label": ["Apple", "Book", "Book", "Book", "Cat", "Apple", "Book"]
                    })

And my desired output is:

label   Apple  Book  Cat
doc_id
a         2.0   1.0  NaN
b         NaN   1.0  1.0
c         NaN   2.0  NaN

Which I can get with:

df["count"] = np.ones(len(df))

new_df = df.pivot_table(index="doc_id", columns="label", values="count", aggfunc="sum")

But creating an temporary column of counts that are all ones feels redundant, what is the proper way to do this?

Akavall
  • 82,592
  • 51
  • 207
  • 251
  • 1
    I see, so all I had do is `df.pivot_table(index="doc_id", columns="label", aggfunc="sum")`, and not even worry about counts...certainly overthought this one. – Akavall Sep 26 '17 at 00:08
  • 1
    And searching is challenging too sometimes because often you're thinking of it differently than how the tool to solve it is phrased. However, what you have won't work. Try `pd.crosstab(df.doc_id, df.label)` – piRSquared Sep 26 '17 at 00:10
  • ... Or any of the many other options that are laid out. Some are faster than others. Choose according to your performance needs. – piRSquared Sep 26 '17 at 00:15

0 Answers0