Making linear regression more compact (python)

Question

Im trying to make a linear expression for a dataset. I have plotted the data and plottet the regression, but my code is not very efficient. Is there any way to make it more compact?

import numpy as np
import matplotlib.pyplot as plt    

temp1, tid0 = np.genfromtxt("forsok1.txt", dtype=float, skip_header=41, usecols = (1,2)).T
tid1 = tid0 - 200
temp2, tid2 = np.genfromtxt("forsok2.txt", dtype=float, skip_header=1, usecols = (1,2)).T
temp3, tid3 = np.genfromtxt("forsok3.txt", dtype=float, skip_header=1, usecols = (1,2)).T

tempreg1_1 = np.zeros(88)
tidreg1_1 = np.zeros(88)
for i in range(0, 88):
    tempreg1_1[i] = temp1[i]
    tidreg1_1[i] = tid1[i]
tempreg2_1 = np.zeros(65)
tidreg2_1 = np.zeros(65)
tempreg3_1 = np.zeros(65)
tidreg3_1 = np.zeros(65)
for i in range(0, 65):
    tempreg2_1[i] = temp2[i]
    tidreg2_1[i] = tid2[i]
    tempreg3_1[i] = temp3[i]
    tidreg3_1[i] = tid3[i]

tempreg1_2 = np.zeros(59)
tidreg1_2 = np.zeros(59)
for i in range(0, 59):
    tempreg1_2[i] = temp1[i+112]
    tidreg1_2[i] = tid1[i+112]
tempreg2_2 = np.zeros(76)
tidreg2_2 = np.zeros(76)
for i in range(0, 76):
    tempreg2_2[i] = temp2[i+93]
    tidreg2_2[i] = tid2[i+93]
tempreg3_2 = np.zeros(55)
tidreg3_2 = np.zeros(55)
for i in range(0,55):
    tempreg3_2[i] = temp3[i+100]
    tidreg3_2[i] = tid3[i+100]

tempreg1_3 = np.zeros(76)
tidreg1_3 = np.zeros(76)
for i in range(0, 76):
    tempreg1_3[i] = temp1[i+210]
    tidreg1_3[i] = tid1[i+210]
tempreg2_3 = np.zeros(80)
tidreg2_3 = np.zeros(80)
for i in range(0, 80):
    tempreg2_3[i] = temp2[i+207]
    tidreg2_3[i] = tid2[i+207]
tempreg3_3 = np.zeros(91)
tidreg3_3 = np.zeros(91)
for i in range(0,91):
    tempreg3_3[i] = temp3[i+181]
    tidreg3_3[i] = tid3[i+181]



R1_1, b1_1 = np.polyfit(tidreg1_1, tempreg1_1, 1)
R2_1, b2_1 = np.polyfit(tidreg2_1, tempreg2_1, 1)
R3_1, b3_1 = np.polyfit(tidreg3_1, tempreg3_1, 1)
R1_2, b1_2 = np.polyfit(tidreg1_2, tempreg1_2, 1)
R2_2, b2_2 = np.polyfit(tidreg2_2, tempreg2_2, 1)
R3_2, b3_2 = np.polyfit(tidreg3_2, tempreg3_2, 1)
R1_3, b1_3 = np.polyfit(tidreg1_3, tempreg1_3, 1)
R2_3, b2_3 = np.polyfit(tidreg2_3, tempreg2_3, 1)
R3_3, b3_3 = np.polyfit(tidreg3_3, tempreg3_3, 1)

tempreg1_1[0] = b1_1
tempreg2_1[0] = b2_1
tempreg3_1[0] = b3_1
for j in range(1, 88):
        tempreg1_1[j] = tempreg1_1[j-1] + 5*R1_1
for j in range(1, 65):
        tempreg2_1[j] = tempreg2_1[j-1] + 5*R2_1
        tempreg3_1[j] = tempreg3_1[j-1] + 5*R3_1

tempreg1_2[0] = b1_2 + 560*R1_2
tempreg2_2[0] = b2_2 + 465*R2_2
tempreg3_2[0] = b3_2 + 500*R3_2
for j in range(1, 59):
        tempreg1_2[j] = tempreg1_2[j-1] + 5*R1_2
for j in range(1, 76):
        tempreg2_2[j] = tempreg2_2[j-1] + 5*R2_2
for j in range(1, 55):
        tempreg3_2[j] = tempreg3_2[j-1] + 5*R3_2

tempreg1_3[0] = b1_3 + 1050*R1_3
tempreg2_3[0] = b2_3 + 1035*R2_3
tempreg3_3[0] = b3_3 + 905*R3_3
for j in range(1, 76):
        tempreg1_3[j] = tempreg1_3[j-1] + 5*R1_3
for j in range(1, 80):
        tempreg2_3[j] = tempreg2_3[j-1] + 5*R2_3
for j in range(1, 91):
        tempreg3_3[j] = tempreg3_3[j-1] + 5*R3_3

plt.figure()
ax1 = plt.subplot(311)
ax2 = plt.subplot(312)
ax3 = plt.subplot(313)

ax1.plot(tid1, temp1, ':', color="g")
ax1.plot(tidreg1_1, tempreg1_1, '-.',color="b")
ax1.plot(tidreg1_2, tempreg1_2, '-.',color="b")
ax1.plot(tidreg1_3, tempreg1_3, '-.',color="b")
ax2.plot(tid2, temp2, ':', color="g")
ax2.plot(tidreg2_1, tempreg2_1, '-.',color="b")
ax2.plot(tidreg2_2, tempreg2_2, '-.',color="b")
ax2.plot(tidreg2_3, tempreg2_3, '-.',color="b")
ax3.plot(tid3, temp3, ':', color="g")
ax3.plot(tidreg3_1, tempreg3_1, '-.',color="b")
ax3.plot(tidreg3_2, tempreg3_2, '-.',color="b")
ax3.plot(tidreg3_3, tempreg3_3, '-.',color="b")

The code i have used is making arrays from small parts of the dataset, then making a linear regression from those arrays. The regression is then made into another array, whitch is plotted in the subplots. This is done for three different dataplots.

I have tried to make it more compact but havent foud a function to use. Thanks for the help and sorry for bad english.

Read about [slicing](https://docs.scipy.org/doc/numpy-1.13.0/reference/arrays.indexing.html). This should help you avoid many of the `for` loops. — MB-F, Feb 06 '18 at 12:26

score 1 · Answer 1 · answered Feb 06 '18 at 12:44

This:

tempreg1_1 = np.zeros(88)
tidreg1_1 = np.zeros(88)
for i in range(0, 88):
    tempreg1_1[i] = temp1[i]
    tidreg1_1[i] = tid1[i]

Is the same as this:

tempreg1_1 = temp1[:88]
tidreg1_1 = tid1[:88]

So you may not even need make those arrays, since you can potentially just use the 'slices' directly.

In general, you rarely need to pre-create an empty array then fill it with a loop. If you find yourself doing this in NumPy, there's almost certainly a better way.

score 0 · Answer 2 · answered Feb 08 '18 at 03:51

You don't have to do all of this explicitly, you can iterate through these almost-all-the-same works. Here's a simplified case, sorry your variables is a bit too much, so I use some easy names:

#read data

plt.figure()
ax1 = plt.subplot(311)
ax2 = plt.subplot(312)
ax3 = plt.subplot(313)

plots = [ax1, ax2, ax3]
for subplot in plots:

    #operating tidreg and tempreg here

    xCordinate = #should be your tidreg
    y1 =  tempreg1
    y2 =  tempreg2

    regression1 = np.poly1d(np.polyfit(xCordinate , y1, 1))
    regression2 = np.poly1d(np.polyfit(xCordinate , y2, 1))
    subplot.plot(xCordinate, regression1(xCordinate), 'b-')
    subplot.plot(xCordinate, regression2(xCordinate), 'b-')

plt.show()

Each for loop corresponds to a subplot, you need only operating data that would be used in that subplot. During each loop, the variable is renewed, so you also don't have to create so many variables. theoretically, that could cut down two third of the work and save a lot of memory.

For indexing or slicing arrays, you can refer this question and this numpy manual

Making linear regression more compact (python)

2 Answers2