Carrying out multiple piecewise regressions with variables from same dataframe (but varying columnpair lengths)

Question

I'm trying to analyse and plot piecewise regressions for daily temperature and gas use. I have six columns (two corresponding to each year) within a csv which I am pulling in using pandas then defining each column as a seperate variable.

I found one of the answers on How to apply piecewise linear fit in Python? extremely helpful and was able to use the following code to run a breakpoint analysis and also plot a graph:

import matplotlib.pyplot as plt
import pwlf

# Importing the csv and defining columns as variables
df = pd.read_csv(PATH)

Y_A = df.Column1 
X_A = df.Column2 
Y_B = df.Column3
X_B = df.Column4

# Analysing breakpoints
my_pwlf_a = pwlf.PiecewiseLinFit(X_A, Y_A)
breaks_a = my_pwlf_a.fit(2)
print(breaks_a)

# Graphing
x_hat = np.linspace(X_A.min(), X_A.max(), 100)
y_hat = my_pwlf.predict(x_hat)

plt.figure()
plt.plot(X_A, Y_A, 'o')
plt.plot(x_hat, y_hat, '-')
plt.xlabel('X'); plt.ylabel('Y');
plt.show()

This runs with no problems and gives the results the desired.

When I try to repurpose the code using my next pair of variables (Y_B and X_B) I run into problems:

my_pwlf_b = pwlf.PiecewiseLinFit(X_B, Y_B)
breaks_b = my_pwlf_b.fit(2)
print(breaks_b)

The error returned is:

ValueError: bounds should be a sequence containing real valued (min, max) pairs for each value in x

All variables are float64 and each column contains 366 rows. Thanks for any help in spotting what I'm missing!

There's no way we can help without seeing what your data is composed of, so please upload and link to it — Zionsof, Jul 25 '19 at 13:26

score 0 · Accepted Answer · answered Jul 25 '19 at 16:32

Thansk to Zionsof for the nudge back towards the data!

Further testing shows that unequal lengths of the column pairings was the problem (e.g. Columns 1 & 2 contained 366 while Columns 3 & 4 contained 365). I had foolishly thought that seperating the columns into seperate variables may fix this but I was incorrect. Here is what I used to fix it (numpy.isfinite):

# Remove any blanks by ensuring the values are finite
Y_A = df.Column1[np.isfinite(df['Column1'])]
X_A = df.Column2[np.isfinite(df['Column2'])]
Y_B = df.Column3[np.isfinite(df['Column3'])]
X_B = df.Column4[np.isfinite(df['Column4'])]

Carrying out multiple piecewise regressions with variables from same dataframe (but varying columnpair lengths)

1 Answers1