2

This might be more of a math question, but ultimately I'd like to perform this in R. If I have a basic exponential curve, I'd like to understand how to use R to apply a series of linear functions to fit the exponential curve as best I can. The reason is the linear line is a particular relationship and the lines represent a rate of change, at each inflection point the rate of change increases. These inflection points are important for the user to know. I have a crude drawing of what I am trying to accomplish attached.

Exponential Curve with Linear Lines

The black line is the exponential curve, the red lines are the series of linear lines, and the orange circles represent of course where the lines intersect. I can perform this task in a haphazard way by just picking arbitrary data points and building linear models until I find a combination that I feel best fits the exponential curve, but I know there is a better way than that.

Here is some code that might help:

data <- c(1:34)
sales <- c(20000000, 25000000,  30000000,   35000000,   43000000,    
50000000,   57000000,   65000000,   72000000,   80000000,   89000000,    
97000000,   108000000,  118000000,  128000000,  138000000,  150000000,   
161000000,  174000000,  187000000,  203000000,  218000000,  235000000,   
251000000,  260000000,  280000000   ,293000000, 310000000,  333000000,   
363000000,  390000000,  415000000,  454000000,  540000000)
data2 <- data.frame(data,sales)

plot(data2$data,data2$sales)

plot of exponential curve as data

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
tfut22312
  • 21
  • 2

2 Answers2

3

With the segmented package (see this question):

library(segmented)
m1 <- lm(sales ~ data, data = data2)  ## initial fit
s1 <- segmented(m1)     ## one breakpoint
s2 <- segmented(m1, psi = c(10,25))  ## two breakpoints, estimated starting values
plot(sales ~ data, data = data2)
lines(data2$data, predict(s1))
lines(data2$data, predict(s2), col = 2, lwd =2)

Results:

s2
Call: segmented.lm(obj = m1, psi = c(10, 25))

Meaningful coefficients of the linear terms:
(Intercept)         data      U1.data      U2.data  
    5942857      7732143      7105220     26962637  

Estimated Break-Point(s):
psi1.data  psi2.data  
    15.72      29.65  

Unlike @JJacquelin's provided solution, you do need to provide starting values for the breakpoints when estimating >1 breakpoint, but they only need to be something reasonable — especially for simple/well-behaved data, the results will be (nearly) identical for a range of similar starting value choices.

data with predictions from segmented fits

Mathematically, I would be picky and say that an exponential curve doesn't really have an inflection point — the slope continuously and gradually increases — but if this is a useful way to convey something to an audience, go for it.

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
2

This is a problem of fitting a piecewise function made of three linear segments.

A very simple method (not iterative, no initial guess required) is explained in https://fr.scribd.com/document/380941024/Regression-par-morceaux-Piecewise-Regression-pdf .The convenient case in treated pp.20-22

A numerical example is given below. The next figure shows the result :

x = 0.17 0.23 0.293 0.349 0.401 0.457 0.509 0.563 0.619 0.668 0.713 0.756 0.798 0.832 0.864 0.889 0.912 0.935 0.957 0.977

y = 0.09 0.094 0.09 0.067 0.082 0.114 0.141 0.173 0.212 0.247 0.278 0.325 0.408 0.459 0.518 0.584 0.631 0.698 0.78 0.859

enter image description here

In order to make easier the implementation of the code and the checking, the calculus is shown below in full details :

enter image description here

The criteria of fitting is least mean square error for the whole data in one shot (not segment by segment).

NOTE : The above example was chosen with few points (20). This was in interest of easier checking. The drawback is that the low number of points wrt the number of parameters(5) to be optimised is a risk of failure or deviation. The method is based on numerical integration which requires as many points as possible for a better accuracy.

JJacquelin
  • 1,529
  • 1
  • 9
  • 11
  • Once again, I like the simplicity of this solution. In this case, however, I have on question. In each of the two linear steps, the error of the coefficients is easily calculated by standard matrix operations. . Now the results of the first step enter the second in a non-linear way, namely in the step function. With the derivative of H being delta (in the framework of Schwartz distributions), does it make sense to propagate the error? How would this be done? It seems a bit problematic as the final error is not a continuous function of the step position. – mikuszefski Aug 11 '21 at 09:28
  • On the other hand, thinking in a broader region where the data jumps between the two functions and the error of the jump position becomes significantly larger than the distance between two data points, I expect the impact of the step position error on the error of the second linear fit parameters. – mikuszefski Aug 11 '21 at 09:31
  • @mikuszefski. Your doubts are justified. The proposed simplified method is convenient if the shape of the function is not too far from a piecewise function made of three linear segments. This isn't the case of the curve of exponential kind. Especially with the new data latter provided by the OP. So, the latter answer from Ben Bolker is preferable. – JJacquelin Aug 11 '21 at 10:49
  • BTW, in the linked PDF page 6 lower right corner of matrices should be F2k and G2k, right? – mikuszefski Aug 11 '21 at 12:59
  • @mikuszefski. You are right. Bravo for pointing out the typo. – JJacquelin Aug 11 '21 at 15:07
  • First is a typo, second is copy-paste I guess ;) – mikuszefski Aug 11 '21 at 15:41