3

I have a dataframe like this:

  date   post 
   da1     a 
   da1     b
   da2     a
   da3     c
   da1     d
   da1     a

What I want to do is this:

    date post total
     da1   a     2
     da1   b     1
     da2   a     1
     da3   c     1
     da1   d     1

I've tried:

    df.groupby(["date","post"]).count().sort_values(['index'], ascending=0)

And it sorts it in that order, but I cannot then access the date/post values via: df.date or df.post anymore as all the dates/posts become their own "keys" to the values in total.

It is imperative that I can access the values in the columns via their headers- how should I go about doing this?

raph
  • 289
  • 1
  • 2
  • 11
  • 2
    call `reset_index()` on the result: `df.groupby(["date","post"]).count().sort_values(['index'], ascending=0).result_index()` – EdChum Dec 16 '16 at 13:46

1 Answers1

3

I think you need:

print (df.groupby(["date","post"]).size().reset_index(name='total'))
  date post  total
0  da1    a      2
1  da1    b      1
2  da1    d      1
3  da2    a      1
4  da3    c      1

What is the difference between size and count in pandas?

Graham
  • 7,431
  • 18
  • 59
  • 84
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252