
Looking at the vehicle speed in relation to the engine speed, the different slopes should give the different gears.
My initial reaction would be to say that this is a linear regression problem. You don't have enough data for anything else. Looking at the data, though, we can see that it is actually two linear regression problems:
[![Engine speed vs. vehicle speed][2]][2]
There is an inflection point at about 700 revs, so you should design a cutoff that selects one of two regression lines, depending on whether you are above or below the cutoff.
To determine the regression in Python, you can use any number of packages. In scikit-learn it looks like this:
https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html
The example given there, using the Python console, is
>>> import numpy as np
>>> from sklearn.linear_model import LinearRegression
>>> X = np.array([[1, 1], [1, 2], [2, 2], [2, 3]])
>>> # y = 1 * x_0 + 2 * x_1 + 3
>>> y = np.dot(X, np.array([1, 2])) + 3
>>> reg = LinearRegression().fit(X, y)
>>> reg.score(X, y)
1.0
>>> reg.coef_
array([1., 2.])
>>> reg.intercept_
3.0000...
>>> reg.predict(np.array([[3, 5]]))
array([16.])
Obviously you need to put your own data in X and y and in fact you would want two arrays for the two sections of your graph. You would also have two reg = LinearRegression().fit(X, y) expressions, and an if statement deciding which reg to use, depending on the input. The inflection point is at the intersection of your two regression lines.
The two regression lines have the form y = m1 x + c1 and y = m2 x + c2, where m1, m2 are the gradients of the lines and c1, c2 the intercepts. At the point of intersection m1x + c1 = m2x + c2. If you don't want to do the maths, then you can use Shapely:
import shapely
from shapely.geometry import LineString, Point
line1 = LineString([A, B])
line2 = LineString([C, D])
int_pt = line1.intersection(line2)
point_of_intersection = int_pt.x, int_pt.y
print(point_of_intersection)
(taken from this answer on Stack Overflow: How do I compute the intersection point of two lines?)
After discussion with Sanjiv, here is the updated code (adapted from here: https://machinelearningmastery.com/clustering-algorithms-with-python/)
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib
from sklearn.cluster import KMeans
matplotlib.use('TkAgg')
df = pd.read_excel("GearPredictionSanjiv.xlsx", sheet_name='FullData')
x = []
y = []
x = round(df['Engine_speed'])
y = df['Vehicle_speed']
if 'Ratio' not in df.columns or not os.path.exists('dataset.xlsx'):
df['Ratio'] = round(x/y)
model = KMeans(n_clusters=5)
# Fit the model
model.fit(X)
# Assign a cluster to each example
yhat = model.predict(X)
# Plot
plt.scatter(yhat, X['Ratio'], c=yhat, cmap=plt.cm.coolwarm)
# Show the plot
plt.show()