1

I'm new at ML and have a problem with catboost. So, I want to predict function value (For example cos | sin etc.). I went over everything but my prediction is always straight line

Is it possible and if it is, how i can issue with my problems

I will be glad to any comment ))

train_data = np.array(np.arange(1, 100, 0.5))
test_data = np.array(np.arange(100, 120, 0.5))

train_labels = np.array(list(map(lambda x : math.cos(x), np.arange(1, 100, 0.5))))

model = CatBoostRegressor(iterations=100, learning_rate=0.01, depth=12, verbose=False)
model.fit(train_data, train_labels)
preds = model.predict(test_data)

plt.plot(preds)
plt.show()

This picture shows what i want:

enter image description here

desertnaut
  • 57,590
  • 26
  • 140
  • 166
  • ML models are normally not good for *extrapolation*; see answer at [Is deep learning bad at fitting simple non linear functions outside training scope?](https://stackoverflow.com/questions/53795142/is-deep-learning-bad-at-fitting-simple-non-linear-functions-outside-training-sco/53796253#53796253) – desertnaut Jan 06 '19 at 17:59

2 Answers2

1

The thing to understand is Machine Learning is not magic.

First ML cannot miraculously predict everything.

Second, you need to pick the right ML algorithm, because there is no one best algorithm that works best all the time. See: https://en.wikipedia.org/wiki/No_free_lunch_theorem

Third, the input features are critical. The input features you are using in this problem are going to look like noise because it doesn't capture the periodicity of the data and CB isn't geared to understanding periodicity.

For your problem, you need to find an ML algorithm that is better suited to make predictions of periodicity.

Some of the more sophisticated ones use recurrent neural networks. I suspect that's too advanced for you at this point in time.

I would abandon this problem and find a problem that is more tractable/suitable for ML.

Something like the home price prediction would be good.

Clem Wang
  • 689
  • 8
  • 14
0

I compiled your code and found that the prediction vector contains the same value for all entries [-0.09229 -0.09229 -0.09229 -0.09229 -0.09229 -0.09229 -0.09229 -0.09229 -0.09229 -0.09229 -0.09229 -0.09229 -0.09229 -0.09229 -0.09229 -0.09229 -0.09229 -0.09229 -0.09229 -0.09229 -0.09229 -0.09229 -0.09229 -0.09229 -0.09229 -0.09229 -0.09229 -0.09229 -0.09229 -0.09229 -0.09229 -0.09229 -0.09229 -0.09229 -0.09229 -0.09229 -0.09229 -0.09229 -0.09229 -0.09229]

I think your model is in high bias(underfit) condition. Try to increase number of features or use polynomial features.

Abhishek
  • 113
  • 3
  • 12
  • Thanks! I increase number to 10000 but it didin't work. I tried with polynominal function - result is the same (identical values) Can you help me with features. As I understand i need several columns with x in different degrees, but sin function is infinite set of addends – Яков Гущин Jan 06 '19 at 22:11
  • I found this article regarding your question. Check it out: https://towardsdatascience.com/can-machine-learn-the-concept-of-sine-4047dced3f11 – Abhishek Jan 07 '19 at 08:47