17

I have a table containing dates and the various cars sold on each dates in the following format (These are only 2 of many columns):

DATE       CAR
2012/01/01 BMW
2012/01/01 Mercedes Benz
2012/01/01 BMW
2012/01/02 Volvo
2012/01/02 BMW
2012/01/03 Mercedes Benz
...
2012/09/01 BMW
2012/09/02 Volvo

I perform the following operation to find the number of BMW cars sold everyday

df[df.CAR=='BMW']['DATE'].value_counts()

The result is something like this :

2012/07/04 15
2012/07/08 8
...
2012/01/02 1

But there are some days when no BMW car was sold. In the result, along with the above I want the days where there are zero occurrences of BMW. Therefore, the desired result is :

2012/07/04 15
2012/07/08 8
...
2012/01/02 1
2012/01/09 0
2012/08/11 0

What can I do to attain such a result?

dfrankow
  • 20,191
  • 41
  • 152
  • 214
Babaji
  • 398
  • 1
  • 4
  • 18

2 Answers2

19

You can reindex the result after value_counts and fill the missing values with 0.

df.loc[df.CAR == 'BMW', 'DATE'].value_counts().reindex(
    df.DATE.unique(), fill_value=0)

Output:

2012/01/01    2
2012/01/02    1
2012/01/03    0
2012/09/01    1
2012/09/02    0
Name: DATE, dtype: int64

Instead of value_counts you could also consider checking the equality and summing, grouped by the dates, which will include all of them.

df['CAR'].eq('BMW').astype(int).groupby(df['DATE']).sum()

Output:

DATE
2012/01/01    2
2012/01/02    1
2012/01/03    0
2012/09/01    1
2012/09/02    0
Name: CAR, dtype: int32
ALollz
  • 57,915
  • 7
  • 66
  • 89
  • Similar situation. Used `.reindex(range(10), fill_value=0)` to force to 10 rows. Filled with zero where needed. – BSalita Dec 30 '21 at 15:28
1

The default behavior of type category is exactly what you want. The non present categories will display with a value of zero. You just need to do:

df.astype({'CAR': 'category'})[df.CAR=='BMW']['DATE'].value_counts()

or better yet, make it definitively a category in your dataframe:

df.CAR = df.CAR.astype('category')
df[df.CAR=='BMW'].DATE.value_counts()

The category type is a better representation of your data and more space-efficient.

neves
  • 33,186
  • 27
  • 159
  • 192
  • Hi @neves, I couldn't reproduce your code, it is wrong. Neither of the two examples give a correct result, it didn't add the rows containing zeros. The correct way should be to set the `DATE` as `category`, then it should work. – My Work Jan 24 '22 at 12:23
  • Related https://stackoverflow.com/questions/46752347/why-does-value-count-method-in-pandas-returns-a-zero-count and https://pandas.pydata.org/pandas-docs/stable/user_guide/categorical.html. – My Work Jan 24 '22 at 12:26