How to get the second derivative/dip from the graph or generate the best eps value

Question

Dataset is below

 ,id,revenue ,profit
0,101,779183,281257
1,101,144829,838451
2,101,766465,757565
3,101,353297,261071
4,101,1615461,275760
5,101,246731,949229
6,101,951518,301016
7,101,444669,430583

Code is below

import pandas as pd;
from sklearn.cluster import DBSCAN
import matplotlib.pyplot as plt
import numpy as np
from sklearn.preprocessing import StandardScaler
import seaborn as sns
from sklearn.neighbors import NearestNeighbors
df = pd.read_csv('1.csv',index_col=None)
df1 = StandardScaler().fit_transform(df)
dbsc = DBSCAN(eps = 2.5, min_samples = 20).fit(df1)
labels = dbsc.labels_

My shape of df is 1999

I got the dip value eps value from the below method, from graph its clear that eps=2.5

Below is the method to find the best eps value

ns = 5
nbrs = NearestNeighbors(n_neighbors=ns).fit(df3)
distances, indices = nbrs.kneighbors(df3)
distanceDec = sorted(distances[:,ns-1], reverse=True)
plt.plot(indices[:,0], distanceDec)
#plt.plot(list(range(1,2000)), distanceDec)

How to find the dip in the graph automatically by the system mean best eps is expected out? without looking in to graph, my system has to tell best eps

There is this paper that introduces an algorithm that automatically assigns values to eps and min_samples : https://www.ijisae.org/IJISAE/article/view/649/pdf — Hugolmn, Jun 22 '20 at 19:01

Gian Arauz · Accepted Answer · 2020-06-28T08:55:20.813

If I understand correctly, you are looking for the precise y value of the inflection point appearing in your ε(x) plot (it should be around 2.0), right?

If this is correct, being ε(x) your curve, the problem is reduced to:

Compute the second derivative of your curve: ε''(x).
Find the zero (or zeroes) of such second derivative: x0.
Recover the optimized ε value, just by plugging the zero into your curve: ε(x0).

Here I attach my answer, based in this two other Stack Overflow answers: https://stackoverflow.com/a/26042315/10489040 (Compute derivative of an array) https://stackoverflow.com/a/3843124/10489040 (Find zero in array)

import numpy as np
import matplotlib.pyplot as plt

# Generating x data range from -1 to 4 with a step of 0.01
x = np.arange(-1, 4, 0.01)

# Simulating y data with an inflection point as y(x) = x³ - 5x² + 2x
y = x**3 - 5*x**2 + 2*x

# Plotting your curve
plt.plot(x, y, label="y(x)")

# Computing y 1st derivative of your curve with a step of 0.01 and plotting it
y_1prime = np.gradient(y, 0.01)
plt.plot(x, y_1prime, label="y'(x)")

# Computing y 2nd derivative of your curve with a step of 0.01 and plotting it
y_2prime = np.gradient(y_1prime, 0.01)
plt.plot(x, y_2prime, label="y''(x)")

# Finding the index of the zero (or zeroes) of your curve
x_zero_index = np.where(np.diff(np.sign(y_2prime)))[0]

# Finding the x value of the zero of your curve
x_zero_value = x[x_zero_index][0]

# Finding the y value corresponding to the x value of the zero
y_zero_value = y[x_zero_index][0]

# Reporting
print(f'The inflection point of your curve is {y_zero_value:.3f}.')

In any case, keep in mind that the inflection point (around 2.0) does not match with the "dip" point appearing around 2.5.

How to get the second derivative/dip from the graph or generate the best eps value

1 Answers1