Hey guys i have an issue with my exam project. I am trying to create a very simple Stock predicter, using a web-api called Iextrading, that returns me the stocks for Telsa the last 5 years in json format, nothing fancy. I then want to be able to predict the stock for tomorow(the next day). However, i must admit that i am feeling very lost doing machine-learning. I think i have managed to create the ai-model. But it always says 100% accuracy, which i know should't be true/possible. To be honest i don't even know where to look for the problem, i am guessing it must be related to the test/train data. And i guess once this is done, then i need to find out how to only give the model tomorow's date as input for prediction.
Here is my code, thanks alot in advance:
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
from sklearn import datasets, linear_model
import sklearn.metrics as sm
import pandas as pd
data = pd.read_json('https://api.iextrading.com/1.0/stock/tsla/chart/5y')
data.head()
data = data.iloc[:, :]
from sklearn import preprocessing
enc = preprocessing.LabelEncoder()
enc.fit(data['date'])
data['date'] = enc.transform(data['date'])
#Label is like a date expression ex. "Dec 13", "Nov 12"
from sklearn import preprocessing
enc2 = preprocessing.LabelEncoder()
enc2.fit(data['label'])
data['label'] = enc2.transform(data['label'])
X = data.iloc[:, :-1].values
X = data.drop('close', axis=1)
y = data.iloc[:, 3]
# Split in train and test
num_training = int(0.8 * len(X))
num_test = len(X) - num_training
# Training data
X_train, y_train = X[:num_training], y[:num_training]
# Test data
X_test, y_test = X[num_training:], y[num_training:]
# Create linear regressor object
regressor = linear_model.LinearRegression()
# Train the model using the training sets
regressor.fit(X_train, y_train)
# Predict the output
y_test_pred = regressor.predict(X_test)
# Compute performance metrics
print("Linear regressor performance:")
print("Mean absolute error =", round(sm.mean_absolute_error(y_test, y_test_pred), 2))
print("Mean squared error =", round(sm.mean_squared_error(y_test, y_test_pred), 2))
print("Median absolute error =", round(sm.median_absolute_error(y_test, y_test_pred), 2))
print("Explain variance score =", round(sm.explained_variance_score(y_test, y_test_pred), 2))
print("R2 score =", round(sm.r2_score(y_test, y_test_pred), 2))
# Perform prediction on train data, reuse
y_test_pred_new = regressor.predict(X_test)
print("\nNew mean absolute error =", round(sm.r2_score(y_test, y_test_pred_new), 2))
Here is an example of the data
Data columns (total 12 columns):
change 1258 non-null float64
changeOverTime 1258 non-null float64
changePercent 1258 non-null float64
close 1258 non-null float64
date 1258 non-null datetime64[ns]
high 1258 non-null float64
label 1258 non-null object
low 1258 non-null float64
open 1258 non-null float64
unadjustedVolume 1258 non-null int64
volume 1258 non-null int64
vwap 1258 non-null float64
dtypes: datetime64[ns](1), float64(8), int64(2), object(1)
#Example Values from data entry: 0
change : 0.184
changeOverTime: 0.000000
changePercent: 0.125
close: 147.654
date: 2013-12-13
high: 151.80
label: Dec 13, 13
low: 147.3200
open: 148.05
unadjustedVolume: 10599775
volume: 10599775
vwap: 149.5224