5

Say I have two lists of data as follows:

x = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
y = [1, 2, 3, 4, 5, 6, 8, 10, 12, 14]

That is, it's pretty clear that merely fitting a line to this data doesn't work, but instead the slope changed at a point in the data. (Obviously, one can pinpoint from this data set pretty easily where that change is, but it's not as clear in the set I'm working with so let's ignore that.) Something with the derivative, I'm guessing, but the point here is I want to treat this as a free parameter where I say "it's this point, +/- this uncertainty, and here is the linear slope before and after this point."

Note, I can do this with an array if it's easier. Thanks!

kb3hts
  • 165
  • 1
  • 4
  • 9
  • Your question is not very clear to me. Do you want to find the *index* where the two slopes start to differ? What do you mean with *uncertainty*? Why is the slope a *free* parameter? It is defined between every two (consecutive) points. – Willem Van Onsem Jul 12 '17 at 16:38
  • BTW, there is no such thing as an array in Python. See: https://stackoverflow.com/questions/1514553/how-to-declare-an-array-in-python – Sumner Evans Jul 12 '17 at 16:42
  • @SumnerEvans, of course there are [arrays](https://docs.python.org/3/library/array.html#module-array). – randomir Jul 12 '17 at 16:43
  • @SumnerEvans there are arrays, e.g. the built-in `array.array`, and the much more full-featured `numpy.ndarray`. The latter is only used often in the domains of scientific/numerical computing, and the former is ever rarer still. However, it is a very common mistake for people to refer to `list`s as arrays. – juanpa.arrivillaga Jul 12 '17 at 16:50
  • @randomir, that's a module that must be imported, not a core, first-class part of the language like `list`s. – Sumner Evans Jul 12 '17 at 16:50
  • 1
    @SumnerEvans it's a part of the standard library. Just because it is in a seperate namespace than `__builtins__` doesn't make it any less part of the Python. `Numpy` is a third-party extension, although, the core Python language has features explicitely added to help `numpy`. E.g. the ellipses singleton: `...` – juanpa.arrivillaga Jul 12 '17 at 16:52
  • @SumnerEvans, ever heard the "batteries included" phrase? If `array` is in standard Python library, what else could you ask for? For every possible function and data type to be a built-in? Next you'll say `reduce` is not "core" anymore in Python3. – randomir Jul 12 '17 at 16:53
  • re: the question with uncertainty, IRL in the data set it's highly unlikely that there is as clear a change as in the data I provided. Rather, it's fluxes changing over time, so the fit isn't perfect, so it's likely to be a case of "change is at Day 10 plus or minus two days." – kb3hts Jul 12 '17 at 17:40

4 Answers4

16

Here is a plot of your data:

enter image description here

You need to find two slopes (== taking two derivatives). First, find the slope between every two points (using numpy):

import numpy as np 
x = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10],dtype=np.float)
y = np.array([1, 2, 3, 4, 5, 6, 8, 10, 12, 14],dtype=np.float)
m = np.diff(y)/np.diff(x)
print (m)
# [ 1.  1.  1.  1.  1.  2.  2.  2.  2.]

Clearly, slope changes from 1 to 2 in the sixth interval (between sixth and seventh points). Then take the derivative of this array, which tells you when the slope changes:

print (np.diff(m))
[ 0.  0.  0.  0.  1.  0.  0.  0.]

To find the index of the non-zero value:

idx = np.nonzero(np.diff(m))[0]
print (idx)
# 4

Since we took one derivative with respect to x, and indices start from zero in Python, idx+2 tells you that the slope is different before and after the sixth point.

Mahdi
  • 3,188
  • 2
  • 20
  • 33
  • 1
    I think this is what I want- thanks! I guess this is easier to deal with in an array so I couldn't think how to go about it. – kb3hts Jul 12 '17 at 17:37
  • Glad to hear that! :) Don't forget to accept it as the answer then! ;) – Mahdi Jul 12 '17 at 17:42
  • 1
    Will do. Just wondering though, so IRL the data I'm using is not as perfect as this case- it's a flux measurement that changes over several days, so I'm suspecting the answer is something like "the slope change is on Day 10, plus or minus two days." Is there a smart way to do that calculation, do you know? Thanks! – kb3hts Jul 12 '17 at 17:59
  • If we have some small sample data, we could modify this answer. But do you think `np.diff(m)` is zero for some intervals or not? – Mahdi Jul 13 '17 at 05:40
3

I'm not sure to understand very well what you want but you can see the evolution this way (derivative):

>>> y = [1, 2, 3, 4, 5, 6, 8, 10, 12, 14]
>>> dy=[y[i+1]-y[i] for i in range(len(y)-1)]
>>> dy
[1, 1, 1, 1, 1, 2, 2, 2, 2]

and then find the point where it change (second derivative):

>>> dpy=[dy[i+1]-dy[i] for i in range(len(dy)-1)]
>>> dpy
[0, 0, 0, 0, 1, 0, 0, 0]

if you want the index of this point :

>>> dpy.index(1)
4

that can give you the value of the last point before change of slope :

>>> change=dpy.index(1)
>>> y[change]
5

In your y = [1, 2, 3, 4, 5, 6, 8, 10, 12, 14] the change happen at the index [4] (list indexing start to 0) and the value of y at this point is 5.

Dadep
  • 2,796
  • 5
  • 27
  • 40
2

You can calculate the slope as the difference between each pair of points (the first derivative). Then check where the slope changes (the second derivative). If it changes, append the index location to idx, the collection of points where the slope changes.

Note that the first point does not have a unique slope. The second pair of points will give you the slope, but you need the third pair before you can measure the change in slope.

idx = []
prior_slope = float(y[1] - y[0]) / (x[1] - x[0])
for n in range(2, len(x)):  # Start from 3rd pair of points.
    slope = float(y[n] - y[n - 1]) / (x[n] - x[n - 1])
    if slope != prior_slope:
        idx.append(n)
    prior_slope = slope

>>> idx
[6]

Of course this could be done more efficiently in Pandas or Numpy, but I am just giving you a simple Python 2 solution.

A simple conditional list comprehension should also be pretty efficient, although it is more difficult to understand.

idx = [n for n in range(2, len(x)) 
       if float(y[n] - y[n - 1]) / (x[n] - x[n - 1]) 
       != float(y[n - 1] - y[n - 2]) / (x[n - 1] - x[n - 2])]
Alexander
  • 105,104
  • 32
  • 201
  • 196
  • 1
    So, I didn't realize I needed to specify that I was fine with using numpy or pandas to do this. How would you go about it using those too? Just curious. – kb3hts Jul 12 '17 at 21:53
1

Knee point might be a potential solution.

from kneed import KneeLocator
import numpy as np 
x = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
y = np.array([1, 2, 3, 4, 5, 6, 8, 10, 12, 14])
kn = KneeLocator(x, y, curve='convex', direction='increasing') 
# You can use array y to automatically determine 'convex' and 'increasing' if y is well-behaved
idx = (np.abs(x - kn.knee)).argmin()

>>> print(x[idx], y[idx])
6 6
Ricardo
  • 691
  • 3
  • 11