-1

In my dataset, there are statistics of every match played by a basketball team during a season (points scored, three-point...). what I want is to predict whether it will win (i.e. 1 or 0) with statistics such as how many points and threes in the next game it will play.

I have defined my variable y as the game result in my dataset (0 or 1), The variable I define as x_data is the directory where the team's statistics are kept.

y = data.Result.values
x_data = data.drop(["Result"],axis='columns')

I first normalize the data and split it as train/test.

# Normalization
x = (x_data - np.min(x_data))/(np.max(x_data) - np.min(x_data)).values
# train test split

x_train, x_test, y_train, y_test = train_test_split(x,y, test_size = 0.2, random_state=42)

x_train = x_train.T
x_test = x_test.T
y_train = y_train.T
y_test = y_test.T

and then I try to guess

mlr = LinearRegression()
mlr.fit(x_train.T,y_train.T)
mlr.predict(x_test.T)
print("mlr test accuracy: {}".format(mlr.score(x_test.T,y_test.T)))

The result I get is just probability. What I want to achieve is to find an answer to the question of whether the team will win or not in the next match.

What should I do in this situation?

desertnaut
  • 57,590
  • 26
  • 140
  • 166
322quick1
  • 133
  • 1
  • 9
  • The result you get from linear regression is *not* probability: https://stackoverflow.com/questions/38015181/accuracy-score-valueerror-cant-handle-mix-of-binary-and-continuous-target – desertnaut Sep 05 '22 at 13:57
  • If you want a binary decision, this is classification, not regression, so using linear regression makes no sense. – Dr. Snoopy Sep 05 '22 at 14:06

2 Answers2

0

The result you are talking about is the last print line I see with mlr test accuracy?

Because that is just the test performance of your linear model.

To have a proper prediction you need to assign the result of predict to a variable.

y_test_hat = model.predict(x_test.T)

which will be an array of proper game predictions of your input data.

Last but not least, I highly suggest to change your regression model to a more suitable classification one (logistic regression, svm, etc), as your outcome is just a binary output (either the player wins or not).

SystemSigma_
  • 1,059
  • 4
  • 18
  • thanks for answer. actually, when I do what you say, I can see the results, but I don't understand anything because I normalization the data. sorry it might be a bit silly, but I don't know how to convert to original data from normalization in python. – 322quick1 Sep 05 '22 at 13:54
  • Just normalize the input ``x`` used for as training input, it's meaningless to normalize already binary targets. As per the prediction , what you see is just a real number. The closer to 1, the closer to victory, and viceversa. However, these are NOT probabilities, it's just the outcome of a regression on binary values! – SystemSigma_ Sep 05 '22 at 13:57
0

First of all your normalization should be done separately for train and test set. At the present you are leaking your training parameters to your test set. So you need to correct that.

Second, transpose of transpose is the same matrix. e.g. X.T.T = X

Thirdly, since this looks like a classification problem (either win or lose), LogisticRegression is what you want (instead of LinearRegression)

s510
  • 2,271
  • 11
  • 18
  • thanks for answer. actually i know i will use logistic regression but what i want is to be able to see statistics as well. I want to see statistics such as how many points or how many threes the team has scored. – 322quick1 Sep 05 '22 at 13:51
  • for that you need to do exploratory analysis of the dataset. – s510 Sep 05 '22 at 13:52