I'm trying to do simple linear regression using this small Dataset (Screenshot).
The dataset is records divided into small time blocks of 4 years each (Except for the 2nd to the last time block of 2016-2018).
What I'm trying to do is try to predict the output of records for the timeblock of 2019-2022. To do this, I placed a 2019-2022 time block with all its rows containing the value of 0 (Since there's nothing made during that time since it's the future). I did that to accommodate the syntax of sklearn's train_test_split and went with this code:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
df = pd.read_csv("TCO.csv")
df = df[['2000-2003', '2004-2007', '2008-2011','2012-2015','2016-2018','2019-2022']]
linreg = LinearRegression()
X1_train, X1_test, y1_train, y1_test = train_test_split(df[['2000-2003','2004-2007','2008-2011',
'2012-2015','2016-2018']],df['2019-2022'],test_size=0.4,random_state = 42)
linreg.fit(X1_train, y1_train)
linreg.intercept_
list( zip( ['2000-2003','2004-2007','2008-2011','2012-2015','2016-2018'],list(linreg.coef_)))
y1_pred = linreg.predict(X1_test)
print(y1_pred)
test_pred_df = pd.DataFrame({'actual': y1_test,
'predicted': np.round(y1_pred, 2),
'residuals': y1_test - y1_pred})
print(test_pred_df[0:10].to_string())
For some reason, the algorithm would always return a 0 as the final prediction for all rows with 0 residuals (This is due to the timeblock of 2019-2022 having all rows of zero.)
I think I did something wrong but I can't tell what it is. (I'm a beginner in this topic.) Can someone point out what went wrong and how to fix it?
Edit: I added a copy-able version of the data:
df = pd.DataFrame( {'Country:':['Brunei','Cambodia','Indonesia','Laos',
'Malaysia','Myanmar','Philippines','Singaore',
'Thailand','Vietnam'],
'2000-2003': [0,0,14,1,6,0,25,8,26,8],
'2004-2007': [0,3,15,6,21,0,37,11,44,36],
'2008-2011': [0,5,31,9,75,0,58,27,96,61],
'2012-2015': [5,11,129,35,238,3,99,65,170,96],
'2016-2018': [6,22,136,17,211,10,66,89,119,88]})