13

All,

My dataset looks like following. I am trying to predict the 'amount' for next 6 months using either the fbProphet or other model. But my issue is that I would like to predict amount based on each groups i.e A,B,C,D for next 6 months. I am not sure how to do that in python using fbProphet or other model ? I referenced official page of fbprophet, but the only information I found is that "Prophet" takes two columns only One is "Date" and other is "amount" .

I am new to python, so any help with code explanation is greatly appreciated!

import pandas as pd
data = {'Date':['2017-01-01', '2017-02-01', '2017-03-01', '2017-04-01','2017-05-01','2017-06-01','2017-07-01'],'Group':['A','B','C','D','C','A','B'],
       'Amount':['12.1','13','15','10','12','9.0','5.6']}
df = pd.DataFrame(data)
print (df)

output:

         Date Group Amount
0  2017-01-01     A   12.1
1  2017-02-01     B     13
2  2017-03-01     C     15
3  2017-04-01     D     10
4  2017-05-01     C     12
5  2017-06-01     A    9.0
6  2017-07-01     B    5.6
Data_is_Power
  • 765
  • 3
  • 12
  • 30

3 Answers3

24

fbprophet requires two columns ds and y, so you need to first rename the two columns

df = df.rename(columns={'Date': 'ds', 'Amount':'y'})

Assuming that your groups are independent from each other and you want to get one prediction for each group, you can group the dataframe by "Group" column and run forecast for each group

from fbprophet import Prophet
grouped = df.groupby('Group')
for g in grouped.groups:
    group = grouped.get_group(g)
    m = Prophet()
    m.fit(group)
    future = m.make_future_dataframe(periods=365)
    forecast = m.predict(future)
    print(forecast.tail())

Take note that the input dataframe that you supply in the question is not sufficient for the model because group D only has a single data point. fbprophet's forecast needs at least 2 non-Nan rows.

EDIT: if you want to merge all predictions into one dataframe, the idea is to name the yhat for each observations differently, do pd.merge() in the loop, and then cherry-pick the columns that you need at the end:

final = pd.DataFrame()
for g in grouped.groups:
    group = grouped.get_group(g)
    m = Prophet()
    m.fit(group)
    future = m.make_future_dataframe(periods=365)
    forecast = m.predict(future)    
    forecast = forecast.rename(columns={'yhat': 'yhat_'+g})
    final = pd.merge(final, forecast.set_index('ds'), how='outer', left_index=True, right_index=True)

final = final[['yhat_' + g for g in grouped.groups.keys()]]
Aditya Santoso
  • 1,031
  • 6
  • 19
  • Thank you! Would it be also possible for you to show another approach ? perhaps without using fbProphet.I realized after posting this question that for some reason fbProphet couldn't be installed on my system. I tried pip install too.. Thanks in advance! – Data_is_Power Apr 08 '19 at 21:40
  • Also, How would I know which prediction is for which group ? I would like to store my group's prediction such that data in the column 0,1,2,3 will represent prediction for Group A,B,C,D ? – Data_is_Power Apr 08 '19 at 23:15
  • If you have trouble installing fbprophet maybe worth opening a new question, or open issue in their github page directly? I installed it perfectly fine using conda: https://anaconda.org/conda-forge/fbprophet – Aditya Santoso Apr 09 '19 at 03:43
  • 1
    The code above already has prediction for each group. If you want to consolidate into one gigantic dataframe that contains yhat data for all groups, you can do `pd.merge()` on each group in a loop. – Aditya Santoso Apr 09 '19 at 03:46
  • @Data_is_Power: if this answers your question, can you mark it as such? Thanks! – Aditya Santoso Apr 10 '19 at 02:49
  • 2
    .Thanks for code and explanation! The above code works for small dataset. However, my dataset contains 83,000 rows with ~7groups. When I try to run above suggested code. I am receiving MemoryError issue. I checked my python bit, and it is 64 bit. Is there a way ,I can break the code into two components ? I was curious if I can create my dataframe and append it later. I tried that, but unfortunately received Index Error. To my understanding forecast variable only display/save last grouped data. I think this is the reason for the error.Any suggestion on how to fix this issue ? – Data_is_Power Apr 19 '19 at 02:33
  • 1
    @Data_is_Power: does it run into MemoryError on merging or on predicting? And how much memory does it consume exactly? One suggestion is to isolate the problem is to dump the forecast for each group to a pickle or csv instead of merging into one gigantic dataframe. Then create another process just to do the final merging. If you have problem merging / appending the dataframe, perhaps you should open a separate question for that. – Aditya Santoso Apr 22 '19 at 03:01
  • I got error. forecast = forecast.rename(columns={'yhat': 'yhat_'+g}) -> TypeError: can only concatenate str (not "int") to str. Any clue? – AndyC Jan 16 '21 at 03:44
  • @AndyC: it means your group column type is an int. You can concat using `'yhat' + str(g)` – Aditya Santoso Jan 19 '21 at 10:08
  • @AdityaSantoso how would you make predictions for each new recorded groups? Here you are training each individual group. I don't understand how inference would be made when new values for groups are added in the future. – RazyDave Mar 27 '23 at 07:13
3
import pandas as pd
import numpy as np
from statsmodels.tsa.statespace.sarimax import SARIMAX
from statsmodels.tsa.arima_model import ARIMA
from statsmodels.tsa.stattools import adfuller
from matplotlib import pyplot as plt
from sklearn.metrics import mean_squared_error
from sklearn.metrics import mean_squared_log_error  



# Before doing any modeling using ARIMA or SARIMAS etc Confirm that
# your time-series is stationary by using Augmented Dick Fuller test
# or other tests.

# Create a list of all groups or get from Data using np.unique or other methods
groups_iter = ['A', 'B', 'C', 'D']

dict_org = {}
dict_pred = {}
group_accuracy = {}

# Iterate over all groups and get data 
# from Dataframe by filtering for specific group
for i in range(len(groups_iter)):
    X = data[data['Group'] == groups_iter[i]]['Amount'].values
    size = int(len(X) * 0.70)
    train, test = X[0:size], X[size:len(X)]
    history = [x for in train]

    # Using ARIMA model here you can also do grid search for best parameters
    for t in range(len(test)):
        model = ARIMA(history, order = (5, 1, 0))
        model_fit = model.fit(disp = 0)
        output = model_fit.forecast()
        yhat = output[0]
        predictions.append(yhat)
        obs = test[t]
        history.append(obs)
        print("Predicted:%f, expected:%f" %(yhat, obs))
    error = mean_squared_log_error(test, predictions)
    dict_org.update({groups_iter[i]: test})
    dict_pred.update({group_iter[i]: test})

    print("Group: ", group_iter[i], "Test MSE:%f"% error)
    group_accuracy.update({group_iter[i]: error})
    plt.plot(test)
    plt.plot(predictions, color = 'red')
    plt.show()
user3432888
  • 131
  • 1
  • 1
  • 11
  • Plus you can also look into multi-variate clustering for different groups because each group might have different seasonality and trend. Groups showing similar patterns will be grouped into a single cluster and you an apply the same algorithm to them. – user3432888 Apr 15 '19 at 05:03
  • good idea with the cluster! how would you suggest utilising the extra training data if you have multiple separate groups on the same date? eg. forecasting the sales of two types of t-shirt, we'd now have 2 sales figures for each date in the time series, and both should exhibit similar patterns – repoleved Jun 12 '19 at 10:00
  • I think this question answers your question. https://stats.stackexchange.com/questions/289163/clustering-time-series-when-each-object-has-multiple-time-series – user3432888 Jun 13 '19 at 10:16
2

I know this is old but I was trying to predict outcomes for different clients and I tried to use Aditya Santoso solution above but got into some errors, so I added a couple of modifications and finally this worked for me:

df = pd.read_csv('file.csv')
df = pd.DataFrame(df)
df = df.rename(columns={'date': 'ds', 'amount': 'y', 'client_id': 'client_id'})
#I had to filter first clients with less than 3 records to avoid errors as prophet only works for 2+ records by group
df = df.groupby('client_id').filter(lambda x: len(x) > 2)

df.client_id = df.client_id.astype(str)

final = pd.DataFrame(columns=['client','ds','yhat'])

grouped = df.groupby('client_id')
for g in grouped.groups:
    group = grouped.get_group(g)
    m = Prophet()
    m.fit(group)
    future = m.make_future_dataframe(periods=365)
    forecast = m.predict(future)
    #I added a column with client id
    forecast['client'] = g
    #I used concat instead of merge
    final = pd.concat([final, forecast], ignore_index=True)

final.head(10)
Irene
  • 111
  • 2