1

I have been spending hours trying to write a function to detect trend in a time series by taking the past 4 months months of data prior to today. I organized my monthly data with dt.month but the issue is that I cannot get the previous year's 12th month if today is january. Here is a toy dataset:

data1 = pd.DataFrame({'Id' : ['001','001','001','001','001','001','001','001','001',
                              '002','002','002','002','002','002','002','002','002',],
                      'Date': ['2020-01-12', '2019-12-30', '2019-12-01','2019-11-01', '2019-08-04', '2019-08-04', '2019-08-01', '2019-07-20', '2019-06-04',
                               '2020-01-11', '2019-12-12', '2019-12-01','2019-12-01', '2019-09-10', '2019-08-10', '2019-08-01', '2019-06-20', '2019-06-01'],
                      'Quantity' :[3,5,6,72,1,5,6,3,9,3,6,7,3,2,5,74,3,4]
                      })

and my data cleaning to get the format that i need is this:

data1['Date'] =pd.to_datetime(data1['Date'], format='%Y-%m')

data2 = data1.groupby('Id').apply(lambda x: x.set_index('Date').resample('M').sum())['Quantity'].reset_index()

data2['M'] =pd.to_datetime(data2['Date']).dt.month
data2['Y'] =pd.to_datetime(data2['Date']).dt.year
data = pd.DataFrame(data2.groupby(['Id','Date','M','Y'])['Quantity'].sum())

data = data.rename(columns={0 : 'Quantity'})

and my function looks like this:

def check_trend():
    today_month = int(time.strftime("%-m"))

    data['n3-n4'] = data['Quantity'].loc[data['M']== (today_month - 3)]-data['Quantity'].loc[data['M']== (today_month - 4)]
    data['n2-n3'] = data['Quantity'].loc[data['M'] == (today_month - 2)] - data['Quantity'].loc[data['M'] == (today_month - 3)]
    data['n2-n1'] = data['Quantity'].loc[data['M'] == (today_month - 2)] - data['Quantity'].loc[data['M'] == (today_month - 1)]

    if data['n3-n4'] < 0 and data['n2-n3'] <0 and data['n2-n1'] <0:
    elif data['n3-n4'] > 0 and data['n2-n3'] > 0 and dat['n2-n1'] >0:
        data['Trend'] = 'Yes'

    else:
        data['Trend'] = 'No'

print(check_trend)

I have looked at this: Get (year,month) for the last X months but it does not seem to be working for a specific groupby object.

I would really appreciate a hint! At least I would love to know if this method to identify trend in a dataset is a good one. After that I plan on using exponential smoothing if there is no trend and Holt's method if there is trend.

UPDATE: thanks to @Vorsprung durch Technik, I have the function working well but i still struggle to incorporate the result into a new dataframe containing the Ids from data2

forecast = pd.DataFrame()
forecast['Id'] = data1['Id'].unique()
for k,g in data2.groupby(level='Id'):

    forecast['trendup'] = g.tail(5)['Quantity'].is_monotonic_increasing
    forecast['trendown'] = g.tail(5)['Quantity'].is_monotonic_decreasing

this returns the same value for each row of the dataset, like if it was doing the calculation for only the first one, how can I ensure that it gets calculated for EACH Id value?

Murcielago
  • 905
  • 1
  • 8
  • 30

1 Answers1

1

I don't think you need check_trend().

There are built-in functions for this:
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Index.is_monotonic_increasing.html
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Index.is_monotonic_decreasing.html

Let me know if this does what you need:

data2 = data1.groupby('Id').apply(lambda x: x.set_index('Date').resample('M').sum()) 
for k,g in data2.groupby(level='Id'): 
    print(g.tail(4)['Quantity'].is_monotonic_increasing)
    print(g.tail(4)['Quantity'].is_monotonic_decreasing) 

This is what is returned by g.tail(4):

                Quantity
Id  Date                
001 2019-10-31         0
    2019-11-30        72
    2019-12-31        11
    2020-01-31         3
                Quantity
Id  Date                
002 2019-10-31         0
    2019-11-30         0
    2019-12-31        16
    2020-01-31         3
mechanical_meat
  • 163,903
  • 24
  • 228
  • 223