I have a problem solving for x
and y
using multiple equations. I have different data points (in # of frames), as follows:
Group 1:
1003, 145, 1344, 66, 171, 962
Group 2:
602, 140, 390, 1955, 289, 90
I have total hours as follows:
- Total Hours:
1999, 341, 1151, 2605, 568, 864
I have set these up in different equations like this:
1003x + 602y = 1999
145x + 140y = 341
and so on.
I would like to find the optimal values for x
and y
that make all equations as close to true as can be.
I tried a linear regression in Python to extract the data, but I am unsure if I am going down the right road or not.
Here is my code in Python:
dataset = pd.read_csv(r"C:\Users\path\to\.csv")
X = dataset[['Group 1 Frames', 'Group 2 Frames']]
y = dataset['Total Hours']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.25, random_state=0)
regressor = LinearRegression()
regressor.fit(X_train, y_train)
coeff_df = pd.DataFrame(regressor.coef_, X.columns, columns=['Coefficient'])
coeff_df
Now this gives me two different values, 1.3007
and 1.2314
. After calculating the Mean Absolute Error and the Mean Squared Error, it seems the results conclude that the numbers are inaccurate and unusable.
Is there a way to more accurately calculate the desired x
and y
values?
My thoughts as to the error:
- My method (I am very new to python and data analysis like this, so I bet heavy on this one)
- Lack of data points (I can collect more)
x
andy
don't have a great relationship withTotal Hours
, hence the high error