4

here is MRE:

df = pd.DataFrame({"hour":[1,2,2,3,3,6,6,6], "location":["a","a", "b","b","c","c","c","c"]})

which looks like this:

    hour    location
0   1         a
1   2         a
2   2         b
3   3         b
4   3         c
5   6         c
6   6         c
7   6         c

When I groupby hour and count number of time each hour occured I get

df.groupby(["hour"]).count()

>>>  location
hour    
1        1
2        2
3        2
6        3    

In want to fill hours 4 and 5 and set its count to 0.

Here is what I desire:

    location
hour    
1       1
2       2
3       2
4       0
5       0
6       3

Previously I've used

df.groupby(["hour", "location"]).count().unstack(fill_value=0).stack()

Which I had no problem with but not working right now either.

I thought it is because this time I am grouping by only one column however when I groupby two columns it still doesn't work. I'm not sure why.

haneulkim
  • 4,406
  • 9
  • 38
  • 80

1 Answers1

5

Method GroupBy.count is used for get counts with exclude missing values, so is necessary specify column after groupby for check column(s) of missing values, so e.g. here is tested hour:

df = df.groupby(["hour", "location"])['hour'].count().unstack(fill_value=0).stack()

But if omit column after groupby this method use all another columns for count. So if use:

print (df.groupby(["hour"]).count())
      location
hour          
1            1
2            2
3            2
6            3

there is another column location, so it use it for count.

If use:

print (df.groupby(["location"]).count())
          hour
location      
a            2
b            2
c            4

there is another column hour, so it use it for count.


But if only 2 columns DataFrame, then is necessary specify column for avoid empty DataFrame, but it also depends of pandas version:

print (df.groupby(["hour", "location"]).count())
Empty DataFrame
Columns: []
Index: [(1, a), (2, a), (2, b), (3, b), (3, c), (6, c)]

print (df.groupby(["hour", "location"])['hour'].count())
hour  location
1     a           1
2     a           1
      b           1
3     b           1
      c           1
6     c           3
Name: hour, dtype: int64

If dont care about missing values is used method GroupBy.size, it not tested missing values, so no column after groupby is necessary:

df = df.groupby(["hour", "location"]).size().unstack(fill_value=0).stack()

print (df)
hour  location
1     a           1
      b           0
      c           0
2     a           1
      b           1
      c           0
3     a           0
      b           1
      c           1
6     a           0
      b           0
      c           3
dtype: int64
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252