Creating a new dataframe from a multi index dataframe using some conditions

Question

I have a time series dataset which is basically consumption data of materials over the past 5 years

Material No Consumption Date    Consumption
A           2019-06-01          1
A           2019-07-01          2
A           2019-08-01          3
A           2019-09-01          4
A           2019-10-01          0
A           2019-11-01          0
A           2019-12-01          0
A           2020-01-01          1
A           2020-02-01          2
A           2020-03-01          3
A           2020-04-01          0
A           2020-05-01          0
B           2019-06-01          0
B           2019-07-01          0
B           2019-08-01          0
B           2019-09-01          4
B           2019-10-01          0
B           2019-11-01          0
B           2019-12-01          0
B           2020-01-01          4
B           2020-02-01          2
B           2020-03-01          8
B           2020-04-01          0
B           2020-05-01          0

From the above dataframe, I want to see the number of months in which the material had at least 1 unit of consumption. The output dataframe should look something like this.

Material no_of_months(Jan2020-May2020) no_of_months(Jun2019-May2020)
A        3                             7
B        3                             4

Currently I'm sub-setting the data frame and using a group by to count the unique entries with non-zero consumption. However, this needs creating multiple data frames for different periods and then merging them. Was wondering if this could be done in a better way using dictionaries.

consumption_jan20_may20 = consumption.loc[consumption['Consumption Date']>='2020-01-01',['Material No','Consumption Date','Consumption']]
consumption_jan20_may20 = consumption_jan20_may20.groupby([pd.Grouper(key='Material No'),grouper])['Consumption'].count().reset_index()
consumption_jan20_may20 = consumption_jan20_may20.groupby('Material No').count().reset_index()
consumption_jan20_may20.columns = ['Material No','no_of_months(Jan2020-May2020)','dummy']
consumption_jan20_may20 = consumption_jan20_may20[['MATNR','no_of_months(Jan2020-May2020)']]

currently I'm sub-setting the dataframe and using a groupby to count the unique entries with non-zero consumption. However, this needs creating multiple dataframes for different periods and then merging them. Was wondering if this could be done in a better way using dictionaries. — Bhanuteja Aryasomayajula, Jun 26 '20 at 05:52
The approach you are following seems appropriate. At some point you will have to split your dataframe on the basis of dates you want. You can do that manually or achieve it by running in a loop. If your dates are not more than 5-6 intervals, I would suggest running manually and concat all dfs at the end. — Murtaza Haji, Jun 26 '20 at 06:01
`df.loc[df["Consumption"]>0].groupby("Material No").count()`? — Henry Yik, Jun 26 '20 at 06:06
@HenryYik this works. However, I want to do the same thing for multiple time periods. Is there a concise way to do this and create a dataframe as shown in the output? — Bhanuteja Aryasomayajula, Jun 26 '20 at 06:21
Just chain your conditions like `df.loc[(df["Consumption"]>0)&(df['Consumption Date']>='2020-01-01')].groupby...)`? — Henry Yik, Jun 26 '20 at 06:32

score 0 · Answer 1 · answered Jun 26 '20 at 06:32

You can firstly limit the data that you are investigating (limit it to a range of months). Let's say you want to check the data for the first 5 months:

df = df[:6]

Then you can use the below code to find the months that the material usage is not zero:

df_nonezero = df[df['Consumption']!=0]

if you want to see how many months the consumption is not zero, you can simply determine the length of new data frame:

len(df_nonezero)

Creating a new dataframe from a multi index dataframe using some conditions

1 Answers1