0

Here's the input data

df1 = pd.DataFrame( { 
        "author" : ["A","B","A","A","C","B"] , 
        "topic" : ["cat", "dog", "dog", "cat", "dog", "dog"] } )
df1
    author  topic
0   A   cat
1   B   dog
2   A   dog
3   A   cat
4   C   dog
5   B   dog

I'm using group by as follows

g1 = df1.groupby('author')['topic'].value_counts()
author  topic
A       cat      2
        dog      1
B       dog      2
C       dog      1

What I'm looking to achieve is this

author  cat   dog 
A       2     1
B       0     2
C       0     1

Basically, need to convert the second-order of index in hierarchical indexing to columns. How can I do that?

Vishakha Lall
  • 1,178
  • 12
  • 33
  • 1
    `pd.crosstab(df1['author'],df1['topic'])` - you can also just use crosstab or pivot_table here – anky Apr 22 '20 at 14:23
  • 1
    See section on `cross tabulation`: https://stackoverflow.com/questions/47152691/how-to-pivot-a-dataframe/47152692#47152692 – piRSquared Apr 22 '20 at 14:24

1 Answers1

2

Use Series.unstack here:

df = df1.groupby('author')['topic'].value_counts().unstack(fill_value=0)

Another solution with crosstab:

df = pd.crosstab(df1['author'], df1['topic'])
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252