Pandas groupby count and fill none count as 0

Question

here is MRE:

df = pd.DataFrame({"hour":[1,2,2,3,3,6,6,6], "location":["a","a", "b","b","c","c","c","c"]})

which looks like this:

    hour    location
0   1         a
1   2         a
2   2         b
3   3         b
4   3         c
5   6         c
6   6         c
7   6         c

When I groupby hour and count number of time each hour occured I get

df.groupby(["hour"]).count()

>>>  location
hour    
1        1
2        2
3        2
6        3

In want to fill hours 4 and 5 and set its count to 0.

Here is what I desire:

Previously I've used

df.groupby(["hour", "location"]).count().unstack(fill_value=0).stack()

Which I had no problem with but not working right now either.

I thought it is because this time I am grouping by only one column however when I groupby two columns it still doesn't work. I'm not sure why.

jezrael · Answer 1 · 2020-04-16T05:56:59.450

Method GroupBy.count is used for get counts with exclude missing values, so is necessary specify column after groupby for check column(s) of missing values, so e.g. here is tested hour:

df = df.groupby(["hour", "location"])['hour'].count().unstack(fill_value=0).stack()

But if omit column after groupby this method use all another columns for count. So if use:

print (df.groupby(["hour"]).count())
      location
hour          
1            1
2            2
3            2
6            3

there is another column location, so it use it for count.

If use:

print (df.groupby(["location"]).count())
          hour
location      
a            2
b            2
c            4

there is another column hour, so it use it for count.

But if only 2 columns DataFrame, then is necessary specify column for avoid empty DataFrame, but it also depends of pandas version:

print (df.groupby(["hour", "location"]).count())
Empty DataFrame
Columns: []
Index: [(1, a), (2, a), (2, b), (3, b), (3, c), (6, c)]

print (df.groupby(["hour", "location"])['hour'].count())
hour  location
1     a           1
2     a           1
      b           1
3     b           1
      c           1
6     c           3
Name: hour, dtype: int64

If dont care about missing values is used method GroupBy.size, it not tested missing values, so no column after groupby is necessary:

df = df.groupby(["hour", "location"]).size().unstack(fill_value=0).stack()

print (df)
hour  location
1     a           1
      b           0
      c           0
2     a           1
      b           1
      c           0
3     a           0
      b           1
      c           1
6     a           0
      b           0
      c           3
dtype: int64

@Ambleu - I hope all info is in answer, if something should be added, let me know. — jezrael, Apr 16 '20 at 05:29
Sorry, I'm not fully understanding your explanation. How does it work for df.groupby(["hour"])? — haneulkim, Apr 16 '20 at 05:53
@Ambleu - It used all another columns without `hour` for count, so here `location` — jezrael, Apr 16 '20 at 05:55

Pandas groupby count and fill none count as 0

1 Answers1