0

I have a time series dataset which is basically consumption data of materials over the past 5 years

Material No Consumption Date    Consumption
A           2019-06-01          1
A           2019-07-01          2
A           2019-08-01          3
A           2019-09-01          4
A           2019-10-01          0
A           2019-11-01          0
A           2019-12-01          0
A           2020-01-01          1
A           2020-02-01          2
A           2020-03-01          3
A           2020-04-01          0
A           2020-05-01          0
B           2019-06-01          0
B           2019-07-01          0
B           2019-08-01          0
B           2019-09-01          4
B           2019-10-01          0
B           2019-11-01          0
B           2019-12-01          0
B           2020-01-01          4
B           2020-02-01          2
B           2020-03-01          8
B           2020-04-01          0
B           2020-05-01          0

From the above dataframe, I want to see the number of months in which the material had at least 1 unit of consumption. The output dataframe should look something like this.

Material no_of_months(Jan2020-May2020) no_of_months(Jun2019-May2020)
A        3                             7
B        3                             4

Currently I'm sub-setting the data frame and using a group by to count the unique entries with non-zero consumption. However, this needs creating multiple data frames for different periods and then merging them. Was wondering if this could be done in a better way using dictionaries.

consumption_jan20_may20 = consumption.loc[consumption['Consumption Date']>='2020-01-01',['Material No','Consumption Date','Consumption']]
consumption_jan20_may20 = consumption_jan20_may20.groupby([pd.Grouper(key='Material No'),grouper])['Consumption'].count().reset_index()
consumption_jan20_may20 = consumption_jan20_may20.groupby('Material No').count().reset_index()
consumption_jan20_may20.columns = ['Material No','no_of_months(Jan2020-May2020)','dummy']
consumption_jan20_may20 = consumption_jan20_may20[['MATNR','no_of_months(Jan2020-May2020)']]

1 Answers1

0

You can firstly limit the data that you are investigating (limit it to a range of months). Let's say you want to check the data for the first 5 months:

df = df[:6]

Then you can use the below code to find the months that the material usage is not zero:

df_nonezero = df[df['Consumption']!=0]

if you want to see how many months the consumption is not zero, you can simply determine the length of new data frame:

len(df_nonezero)
Cicilio
  • 413
  • 6
  • 12