7

I'm new to both Python and Facebook Prophet, so this may be a no-brainer, but I haven't been able to find an answer online.

I have a 7-column csv file. One column contains a datestamp ('ds') column with daily increments, and the other 6 columns ('y1', 'y2', 'y3', etc.) contain 6 variables whose values align with the datestamps.

Instead of creating six different two-column csv files and running Prophet six different times (predicting only one variable at a time), I'd like to find a way to predict all six variables at once. Here's what I'm trying:

df = pd.read.csv('example_file.csv')
cols = ['y1','y2','y3','y4','y5','y6']
results = []
for col in cols:
    subdf = df[['ds', col]].dropna()
    m = Prophet()
    m.fit(subdf)
    result = m.predict(m.make.future.dataframe(periods = 90))
    results.append(result)
df.predict = pd.concat(results, axis=1)
df.predict.to_csv('example_file.csv')

When I run it, I'm getting the following error:

ValueError: Dataframe must have columns 'ds' and 'y' with the dates and values respectively.

Any insight/help would be much appreciated. Thanks!

Carl
  • 401
  • 1
  • 8
  • 12
  • The error message is quite clear: the second column must be named 'y'. Are you trying to fix your code, or get a whole new approach? – Mad Physicist Aug 02 '18 at 16:20
  • I was basically looking for the answer that warwick12 gave me. I know that Prophet wants the second column to be named 'y', but I didn't know how to achieve that with multiple columns. – Carl Aug 02 '18 at 17:19
  • If you like the answer, go ahead and accept it. That will remove your question from the unanswered queue and get you and warwick some points. – Mad Physicist Aug 02 '18 at 18:05

1 Answers1

10

Sorry I wanted to comment but I don't have sufficient reputation yet. Please rename your columns in the loop

subdf = subdf.rename(columns={'ds':'ds', col:'y'})

Prophet imposes the strict condition that the input columns be named ds (the time column) and y (the metric column).

Mad Physicist
  • 107,652
  • 25
  • 181
  • 264
warwick12
  • 316
  • 3
  • 12
  • Thanks so much! That did the trick! (I also needed to use underscores instead of periods for "make_future_dataframe".) Now I'm wondering how to export ONLY the yhat values for each variable (and not 'trend', 'yhat_lower', 'yhat_upper', etc.). – Carl Aug 02 '18 at 17:11
  • Welcome to SO. Your answer is perfectly fine as an answer. +1 and hopefully OP selects it. – Mad Physicist Aug 02 '18 at 18:06
  • 1
    @Chuck , Thank you for accepting the answer guys. You can just do this to get only 'yhat' column : result_df = result[['ds', 'yhat']] and then append this to results. This should do the trick. – warwick12 Aug 04 '18 at 01:15