I want to predict the behavior of my data in the future. The value of my data x and y is about 1000 values. I want to predict the value y[1001]. This is my example.
from numpy.random import randn
from numpy.random import seed
from numpy import sqrt
import numpy
from numpy import sum as arraysum
from scipy.stats import linregress
from matplotlib import pyplot
seed(1)
x = 20 * randn(1000) + 100
print(numpy.size(x))
y = x + (10 * randn(1000) + 50)
print(numpy.size(y))
# fit linear regression model
b1, b0, r_value, p_value, std_err = linregress(x, y)
# make predictions
yhat = b0 + b1 * x
# define new input, expected value and prediction
x_in = x[1001]
y_out = y[1001]
yhat_out = yhat[1001]
# estimate stdev of yhat
sum_errs = arraysum((y - yhat)**2)
stdev = sqrt(1/(len(y)-2) * sum_errs)
# calculate prediction interval
interval = 1.96 * stdev
print('Prediction Interval: %.3f' % interval)
lower, upper = y_out - interval, y_out + interval
print('95%% likelihood that the true value is between %.3f and %.3f' % (lower, upper))
print('True value: %.3f' % yhat_out)
# plot dataset and prediction with interval
pyplot.scatter(x, y)
pyplot.plot(x, yhat, color='red')
pyplot.errorbar(x_in, yhat_out, yerr=interval, color='black', fmt='o')
pyplot.show()
When I try that, it gives me this error.
x_in = x[1001]
IndexError: index 1001 is out of bounds for axis 0 with size 1000
My goal is to predict the behavior of my data in the future and evalute it by plotting its error bars too. I see this example how do you create a linear regression forecast on time series data in python but I don't understand how to apply it to my data. I found that it is possible to use ARIMA model. Please How could I do that?