1

I have power use data that (graphically and intuitively) show that if temperature exceeds a certain value, electricity use starts to increase almost linearly. This sounds logical because the air conditioning systems starts running; hotter outside temp means the systems runs longer and uses more power.

Is there a way to fit a dataset over two domains, e.g

T < T0. poweruse = C
T > T0. poweruse = A * temp + B

and fit all parameters, A, B, C and T0?

Prefer solutions in R, but will happily take Python or JMP.

Rob Hanssen
  • 153
  • 8

2 Answers2

4

If poweruse is a continuous function of temp which starts out in the horizontal leg for lower temperatures and then transitions to an increasing linear leg at higher temperatures then we can represent it as the maximum of the two legs at each temperature.

Given ascending temperatures we can assume that at least the first k are in the horizontal leg and that at least the last k are in the linear leg. We fit these portions of the two legs separately using mean and lm respectively to get starting values and then use those to get the overall fit using nls.

Note that T0 is at the intersection of the two legs so T0 is the solution to the equation C = A + T0 * B which is T0 = (C-A)/B .

No packages are used.

# test data
set.seed(123)
temp <- 1:100
poweruse <- pmax(50, 1 + 2 * temp) + rnorm(100)

# starting values
k <- 10
fm0 <- lm(poweruse ~ temp, subset = seq(to = length(temp), length = k))
st <- list(C = mean(head(poweruse, k)), A = coef(fm0)[[1]], B = coef(fm0)[[2]])

fm <- nls(poweruse ~ pmax(C, A + B * temp), start = st)
fm
## Nonlinear regression model
##   model: poweruse ~ pmax(C, A + B * temp)
##    data: parent.frame()
##       C       A       B 
## 49.9913  0.8876  2.0037 
##  residual sum-of-squares: 81.67
##
## Number of iterations to convergence: 2 
## Achieved convergence tolerance: 2.874e-07

# calculate T0
with(as.list(coef(fm)), (C - A)/B)
## [1] 24.50596

plot(poweruse ~ temp, type = "l")
lines(fitted(fm) ~ temp, col = "red")

screenshot

G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
2

If you know T0 then a linear model will work:

lm(T ~ 1 + I(T>T0) + I(T>T0):temp, ...)

the I(T>T0) evaluates to 0 if T<T0 and 1 otherwise.

The segmented package will fit separate linear models above and below the breakpoint, and will estimate the breakpoint for you, but I don't think there's a way to constrain the linear model to be constant (slope=0) below the breakpoint and non-constant above ... As a slightly hacky solution, you could use segmented to estimate the breakpoint, then re-fit with the lm() approach above (your breakpoint might not be optimal/you might underestimate the uncertainty in your estimates slightly because of the two-stage approach where the first stage doesn't impose the "initially flat" constraint and the second stage assumes that the breakpoint is known exactly).

For what it's worth, adding the "and I need to estimate the breakpoint" part makes things a little trickier because the goodness-of-fit surface becomes discontinuous at the observed values of T ...

This answer could probably be simplified to do what you want ...

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453