1

I have data of the form shown in figure. The natural logarithm of the data when will always have three distinct linear ranges but the ranges will not always be the same, it varies with data, but there will definitely be three regions where three different linear fits can be made.

I am trying to determine the best three linear fits to natural logarithm of it marked as I, II and III. The figure shows natural logarithm of y-data. This has to applied to at least thousand datasets. The code automatically has to detect the best linear fits for the three regions shown in figure.

enter image description here

I am trying to get it done using thus code which tries to apply two piecewise linear fits using code from here, but it does not correctly. I need it extended to three liner fits. How can I determine three best linear fits to the data with Python?

MWE

import numpy as np
import matplotlib.pyplot as plt
from matplotlib.backends.backend_pdf import PdfPages
import matplotlib.colors as colors
import matplotlib.cm as mplcm
import itertools
from scipy import optimize

def piecewise_linear(x, x0, y0, k1, k2):
    return np.piecewise(x, [x < x0], [lambda x:k1*x + y0-k1*x0, lambda x:k2*x + y0-k2*x0])



with open('./three_piecewise_linear.dat', "r") as data:
    while True:
        line = data.readline()
        if not line.startswith('#'):
            break
    data_header = [i for i in line.strip().split('\t') if i]
    _data_ = np.genfromtxt(data, names = data_header, dtype = None, delimiter = '\t')
_data_.dtype.names = [j.replace('_', ' ') for j in _data_.dtype.names]
data = np.array(_data_.tolist())
n_rf = data.shape[1] - 2
xd = np.linspace(1, 1.5, 100)
fit_data = np.empty(shape = (100, n_rf))

for i in range(n_rf):
    p , e = optimize.curve_fit(piecewise_linear, data[:, 1], np.log(data[:, i + 2]))
    fit_data[:, i] = piecewise_linear(xd, *p)
Community
  • 1
  • 1
Tom Kurushingal
  • 6,086
  • 20
  • 54
  • 86
  • Your question is about piecewise linear fitting. It would help if you removed all the code related to plotting, so readers can focus on the code that is relevant to the question. – Warren Weckesser Jun 04 '16 at 18:50
  • Unwanted code removed. – Tom Kurushingal Jun 04 '16 at 20:23
  • What you show in your figure is not what's typically meant by "piecewise linear" (which usually refers to a continuity of pieces), but your figure just shows three independent lines for different segments of the data. For that, you can just use linear regression to fit the three lines independently for each segment of the data. – tom10 Jun 05 '16 at 14:47
  • @tom10 I have only shown partial linear fits. What I require is to determine three best linear fits for the regions shown in figure. This has to be determined for at least a thousand datasets (although not shown in the code). – Tom Kurushingal Jun 05 '16 at 16:37
  • I'm pretty sure, that this problem in general is np-hard. I think it would be not too hard to implement this using mixed-integer programming. Of course, then the performance is heavily dependent on the data. (For now some formal description is lacking; it is not clear to me what the constraints are exactly; the best solution could one which only covers 1 point for each segment; but i'm pretty sure you would want to cover as many points as possible (or maybe all), therefore we need a formal description of these rules / tradeoff-weights) – sascha Jun 05 '16 at 17:51
  • What *exactly* do you mean by "piecewise"? – tom10 Jun 05 '16 at 18:26
  • @tom10 Updated figure. – Tom Kurushingal Jun 05 '16 at 19:12
  • Related: https://en.wikipedia.org/wiki/Multivariate_adaptive_regression_splines – ayhan Jun 05 '16 at 19:27
  • @nxkryptor: The figure doesn't answer anything. You don't seem to understand the problem domain well enough to specify a well formed question. I would suggest that you learn a little more before trying to write the code. – tom10 Jun 05 '16 at 20:48
  • @tom10 I have added more details. Hope it is clear enough? – Tom Kurushingal Jun 06 '16 at 06:37

0 Answers0