I have been spending hours trying to write a function to detect trend in a time series by taking the past 4 months months of data prior to today. I organized my monthly data with dt.month but the issue is that I cannot get the previous year's 12th month if today is january. Here is a toy dataset:
data1 = pd.DataFrame({'Id' : ['001','001','001','001','001','001','001','001','001',
'002','002','002','002','002','002','002','002','002',],
'Date': ['2020-01-12', '2019-12-30', '2019-12-01','2019-11-01', '2019-08-04', '2019-08-04', '2019-08-01', '2019-07-20', '2019-06-04',
'2020-01-11', '2019-12-12', '2019-12-01','2019-12-01', '2019-09-10', '2019-08-10', '2019-08-01', '2019-06-20', '2019-06-01'],
'Quantity' :[3,5,6,72,1,5,6,3,9,3,6,7,3,2,5,74,3,4]
})
and my data cleaning to get the format that i need is this:
data1['Date'] =pd.to_datetime(data1['Date'], format='%Y-%m')
data2 = data1.groupby('Id').apply(lambda x: x.set_index('Date').resample('M').sum())['Quantity'].reset_index()
data2['M'] =pd.to_datetime(data2['Date']).dt.month
data2['Y'] =pd.to_datetime(data2['Date']).dt.year
data = pd.DataFrame(data2.groupby(['Id','Date','M','Y'])['Quantity'].sum())
data = data.rename(columns={0 : 'Quantity'})
and my function looks like this:
def check_trend():
today_month = int(time.strftime("%-m"))
data['n3-n4'] = data['Quantity'].loc[data['M']== (today_month - 3)]-data['Quantity'].loc[data['M']== (today_month - 4)]
data['n2-n3'] = data['Quantity'].loc[data['M'] == (today_month - 2)] - data['Quantity'].loc[data['M'] == (today_month - 3)]
data['n2-n1'] = data['Quantity'].loc[data['M'] == (today_month - 2)] - data['Quantity'].loc[data['M'] == (today_month - 1)]
if data['n3-n4'] < 0 and data['n2-n3'] <0 and data['n2-n1'] <0:
elif data['n3-n4'] > 0 and data['n2-n3'] > 0 and dat['n2-n1'] >0:
data['Trend'] = 'Yes'
else:
data['Trend'] = 'No'
print(check_trend)
I have looked at this: Get (year,month) for the last X months but it does not seem to be working for a specific groupby object.
I would really appreciate a hint! At least I would love to know if this method to identify trend in a dataset is a good one. After that I plan on using exponential smoothing if there is no trend and Holt's method if there is trend.
UPDATE: thanks to @Vorsprung durch Technik, I have the function working well but i still struggle to incorporate the result into a new dataframe containing the Ids from data2
forecast = pd.DataFrame()
forecast['Id'] = data1['Id'].unique()
for k,g in data2.groupby(level='Id'):
forecast['trendup'] = g.tail(5)['Quantity'].is_monotonic_increasing
forecast['trendown'] = g.tail(5)['Quantity'].is_monotonic_decreasing
this returns the same value for each row of the dataset, like if it was doing the calculation for only the first one, how can I ensure that it gets calculated for EACH Id value?