I have a corporate bond dataframe that has multiple types of bonds with two columns on their yields and years-to-maturity values. When I plot their yields against the years to maturity, I can clearly see at least three, possibly four yield curves. I would like to fit at least three curves on this data and then divide the dataframe into smaller chunks based on distance to the curve. Here's a simple scatter plot:
I have tried the Ransac method described here: Iteratively fitting polynomial curve
and here's what I tried using RANSAC:
y_ax = df_clean.YTW
x_ax = df_clean.YTM
class PolynomialRegression(object):
def __init__(self, degree=3, coeffs=None):
self.degree = degree
self.coeffs = coeffs
def fit(self, X, y):
self.coeffs = np.polyfit(X.ravel(), y, self.degree)
def get_params(self, deep=False):
return {'coeffs': self.coeffs}
def set_params(self, coeffs=None, random_state=None):
self.coeffs = coeffs
def predict(self, X):
poly_eqn = np.poly1d(self.coeffs)
y_hat = poly_eqn(X.ravel())
return y_hat
def score(self, X, y):
return mean_squared_error(y, self.predict(X))
poly_degree = 3
ransac = RANSACRegressor(PolynomialRegression(degree=poly_degree),
residual_threshold=2 * np.std(y_ax),
random_state=0)
ransac.fit(np.expand_dims(x_ax, axis=1), y_ax)
inlier_mask = ransac.inlier_mask_
y_hat = ransac.predict(np.expand_dims(x_vals, axis=1))
plt.plot(x_vals, y_vals, 'bx', label='input samples')
plt.plot(x_vals[inlier_mask], y_vals[inlier_mask], 'go', label='inliers (2*STD)')
plt.plot(x_vals, y_hat, 'r-', label='estimated curve')
Shortly:
- Is there a way such that I can fit 3-4 separate curves on these plot points?
- How can I split the dataframe based on these curves?
The entire data is here (Only YTW & YTM are plotted here): Corp Bonds Data