How to .value_count() rows while taking into account other columns?

Question

I have a dataframe like this:

  date   post 
   da1     a 
   da1     b
   da2     a
   da3     c
   da1     d
   da1     a

What I want to do is this:

    date post total
     da1   a     2
     da1   b     1
     da2   a     1
     da3   c     1
     da1   d     1

I've tried:

    df.groupby(["date","post"]).count().sort_values(['index'], ascending=0)

And it sorts it in that order, but I cannot then access the date/post values via: df.date or df.post anymore as all the dates/posts become their own "keys" to the values in total.

It is imperative that I can access the values in the columns via their headers- how should I go about doing this?

call `reset_index()` on the result: `df.groupby(["date","post"]).count().sort_values(['index'], ascending=0).result_index()` — EdChum, Dec 16 '16 at 13:46

score 3 · Accepted Answer · edited Sep 27 '17 at 16:55

3

I think you need:

print (df.groupby(["date","post"]).size().reset_index(name='total'))
  date post  total
0  da1    a      2
1  da1    b      1
2  da1    d      1
3  da2    a      1
4  da3    c      1

What is the difference between size and count in pandas?

edited Sep 27 '17 at 16:55

Graham

7,431
18
59
84

answered Dec 16 '16 at 13:47

jezrael

822,522
95
1,334
1,252

2

do you even need `sort_index()`? – Roman Pekar Dec 16 '16 at 14:18

How to .value_count() rows while taking into account other columns?

1 Answers1