4

I have a Time series data for almost 5 years. Using this data I want to forecast next 2 years. How to do this?

I referred many websites regarding this. I noticed that mostly predictions are done only with same set of data used for training they are not forecasting for future such as for next 30 days. If it possible to achieve this via TensorFlow. May I know how to achieve this?

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import tensorflow as tf
from sklearn.preprocessing import MinMaxScaler
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import LSTM
from tensorflow.keras.layers import Dropout

dataset_train = pd.read_csv(r'C:\Users\Kavin\source\repos\SampleTensorFlow\SampleTensorFlow\data\traindataset.csv')
training_set = dataset_train.iloc[:, 1:2].values

sc = MinMaxScaler(feature_range = (0, 1))
training_set_scaled = sc.fit_transform(training_set)

X_train = []
y_train = []
for i in range(60, 2035):
    X_train.append(training_set_scaled[i-60:i, 0])
    y_train.append(training_set_scaled[i, 0])
X_train, y_train = np.array(X_train), np.array(y_train)

X_train = np.reshape(X_train, (X_train.shape[0], X_train.shape[1], 1))


regressor = Sequential()

regressor.add(LSTM(units = 50, return_sequences = True, input_shape = (X_train.shape[1], 1)))
regressor.add(Dropout(0.2))

regressor.add(LSTM(units = 50, return_sequences = True))
regressor.add(Dropout(0.2))

regressor.add(LSTM(units = 50, return_sequences = True))
regressor.add(Dropout(0.2))

regressor.add(LSTM(units = 50))
regressor.add(Dropout(0.2))


regressor.add(Dense(units = 1))

regressor.compile(optimizer = 'adam', loss = 'mean_squared_error')

regressor.fit(X_train, y_train, epochs = 100, batch_size = 32)


dataset_test = pd.read_csv(r'C:\Users\Kavin\source\repos\SampleTensorFlow\SampleTensorFlow\data\testdataset.csv')
result = dataset_test[['Date','Open']]
real_stock_price = dataset_test.iloc[:, 1:2].values


dataset_total = pd.concat((dataset_train['Open'], dataset_test['Open']), axis = 0)
inputs = dataset_total[len(dataset_total) - len(dataset_test) - 60:].values
inputs = inputs.reshape(-1,1)
inputs = sc.transform(inputs)
X_test = []
for i in range(60, 76):
    X_test.append(inputs[i-60:i, 0])
X_test = np.array(X_test)
X_test = np.reshape(X_test, (X_test.shape[0], X_test.shape[1], 1))
predicted_stock_price = regressor.predict(X_test)
predicted_stock_price = sc.inverse_transform(predicted_stock_price)

result['PredictedResult'] = pd.Series(predicted_stock_price.ravel(), index=result.index)

result.to_csv(r"C:\Users\Kavin\Downloads\PredictedStocks.csv", index=False)

ax = plt.gca()

result.plot(kind='line', x='Date', y='Open', color='red', label = 'Real Stock Price', ax=ax)
result.plot(kind='line', x='Date', y='PredictedResult', color='blue', label = 'Predicted Stock Price', ax=ax)

plt.show()
Clinton Prakash
  • 967
  • 9
  • 20

2 Answers2

1

for all machine learning problem you want to ask yourself the question "What do i want to predict and what data do i have ?"

In your case you want to predict values at an undefined time in the future, let's call that time T.

We suppose that your current data is labelled ie. for each sample/row (x) you have a corresponding value (y). Let xt be the timestamp of your x data

If you want to predict y at time xt + T then you must feed your algorithm with data such as for each sample x, the corresponding label is y at time xt + T.

This way your algorithm will "learn" to predict the value of y at time xt + T from data at time xt

With Pandas, this can be achieved with shift.

Bruce Swain
  • 583
  • 3
  • 10
0

time is mostly an abstraction - means nothing, better think about Sequencies. And in order to predict next yet unknown step in sequence provide to DL model correct input_shape & to predict() method the same set of NEW features that you consider to become base for the prediction next moment... e.g. here or here - ED

-- though I still think that encoder-decoder seq2seq model still gives decoded output ONLY if it was present in past (before encoding) & besides if the task of reconstruction of features by decoder from encoded data is correct (that is not always possible to reconstruct similar to those that were encoded)

So, I still consider example in TF to be the best for your goal - though am not sure in adequacy of prediction (that it will become true - as so as even DL gives only likelihood as well as ML based on Bayesian statistics )

if your Dependency is continuous in time and you found or know the Function that describes it - of course you can get prediction for any steps forward for any horizon that you'd like... e.g. you discovered a tendency or cyclicity (e.g. daily - here time can be considered to be a feature)...

another approach is Differencing - it is a technique that removes the trend and seasonality of TimeSeries in order to provide stationarity to these TimeSeries.

that's all, nothing else about the mystery of Dependency and Backpropagation

JeeyCi
  • 354
  • 2
  • 9