1

I have a (x,y) dataset, and I would like to calculate the r_value**2 for every 10 elements (so between element 0 and 9, between 1 and 10, ..., between n-10 and n).

Ideally the code should give out the r_value**2_max and save all r-values in a list. I've made a loop, but don't know how to tell stats.linregress to look between test_i and test_i+10 and save all r-values**2 in a list.

So far, I have this:

import matplotlib.pyplot as plt
from scipy import stats
import numpy as np
import csv


path = '/storage/.../01_python_in/'

test = np.loadtxt(path + 'sample_data.txt', skiprows=0)

test_min = 0
test_max = len(test)

for test_i in range(test_min, test_max-10):
    slope, intercept, r_value, p_value, std_err = stats.linregress(test[:, 0], test[:, 1])
    print 'i:', test_i, 'r**2:', r_value**2
albert
  • 8,027
  • 10
  • 48
  • 84
burazija
  • 17
  • 4
  • are you intending to do this for x-coords 1 and 2, 2 and 3, 3 and 4 etc., and then 10 and 11, 11 and 12 ... 18 and 19? – David Zemens Aug 31 '15 at 13:50
  • Looks to me like you want to do a "rolling window"? Look here: [Efficient Numpy 2D array construction from 1D array](http://stackoverflow.com/q/4923617/1730674) – askewchan Aug 31 '15 at 13:59

1 Answers1

0

The way to manually implement this is to slice the first dimension of your array from test_i to test_i + 10, like this:

linregress(test[test_i:test_i+window, 0], test[test_i:test_i+window, 1])

Actually, you don't have to split apart the x and y parts for linregress:

linregress(test[test_i:test_i+window])

You could also save the r_values by building a list in your loop. This, along with the above is shown here:

window = 10
r_values = []
for test_i in range(len(test)-window):
    slope, intercept, r_value, p_value, std_err = \
            stats.linregress(test[test_i:test_i + window])
    r_values.append(r_value)
    print 'i:', test_i, 'r**2:', r_value**2

It's actually simple enough for a list comprehension:

r_values = [stats.linregress(test[i:i+w]).rvalue for i in range(len(test)-w)]

You can get the squares then with:

r_values = np.asarray(r_values)
r_values2 = r_values**2

And the max i with:

max_i = np.argmax(r_values2)
askewchan
  • 45,161
  • 17
  • 118
  • 134
  • You're welcome, @burazija! Welcome to [SO], I hope you can get more help in the future as well! Please click the checkmark near the top of my answer to accept it as the answer to your question. Thanks! – askewchan Sep 04 '15 at 13:49