3

I have the following data:

0.8340502011561366 0.8423491600218922 0.8513456021654467 0.8458192388553084 0.8440111276014195 0.8489589671423143 0.8738088120491972 0.8845129900705279 0.8988298998926688 0.924633964692693 0.9544790734065157 0.9908034431246875 1.0236430466543138 1.061619773027915 1.1050038249835414 1.1371449802490126 1.1921182610371368 1.2752207659022576 1.344047620255176 1.4198117350668353 1.507943067143741 1.622137968203745 1.6814098429502085 1.7646810054280595 1.8485457435775694 1.919591124757554 1.9843144220593145 2.030158014640226 2.018184122476175 2.0323466012624207 2.0179200409023874 2.0316932950853723 2.013683870089898 2.03010703506514 2.0216151623726977 2.038855467786505 2.0453923522466093 2.03759031642753 2.019424996752278 2.0441806106428606 2.0607521369415136 2.059310067318373 2.0661157975162485 2.053216429539864 2.0715123971225564 2.0580473413362075 2.055814512721712 2.0808278560688964 2.0601637029377113 2.0539429365156003 2.0609648613513754 2.0585135712612646 2.087674625814453 2.062482961966647 2.066476100210777 2.0568444178944967 2.0587903943282266 2.0506399365756396

The data plotted looks like:

Plotted data

I want to find the point where the slope changes in sign (I circled it in black. Should be around index 26):

Point I am trying to extract

I need to find this point of change for several hundred files. So far I tried the recommendation from this post:

Finding the point of a slope change as a free parameter- Python

I think since my data is a bit noisey I am not getting a smooth transition in the change of the slope.

This is the code I have tried so far:

import numpy as np

#load 1-D data file
file = str(sys.argv[1])
y = np.loadtxt(file)

#create X based on file length
x = np.linspace(1,len(y), num=len(y))

Find first derivative:

m = np.diff(y)/np.diff(x)
print(m)

#Find second derivative
b = np.diff(m)
print(b)
#find Index

index = 0
for difference in b:
    index += 1
    if difference < 0: 
        print(index, difference)

Since my data is noisey I am getting some negative values before the index I want. The index I want it to retrieve in this case is around 26 (which is where my data becomes constant). Does anyone have any suggestions on what I can do to solve this issue? Thank you!

  • yes, it could be at 24 – Green-Chile-Data Apr 04 '22 at 21:43
  • the location of the "max" value doesn't depend on the `x` but totally on the values of `y`. So you don't need to compute gradients and gradient of gradients – cards Apr 04 '22 at 21:56
  • Oh, so would using something on 'y' like: max = argrelextrema(y, np.greater) work? – Green-Chile-Data Apr 04 '22 at 22:05
  • Is the second slope known to be zero or is it just looking flat in comparison? – mikuszefski Apr 11 '22 at 07:55
  • @mikuszefski the data goes through an initial stabilization and then goes flat as it stabilizes. Technically since there is so much noise in the data it's not really zero unless I smooth my data enough. – Green-Chile-Data Apr 13 '22 at 19:47
  • If we consider the stabilization somewhat linear, the problem you have is what you can find [here](https://de.scribd.com/document/380941024/Regression-par-morceaux-Piecewise-Regression-pdf) on page 8 – mikuszefski Apr 14 '22 at 05:38

2 Answers2

2

When the first derivative changes sign, that's when the slope sign changes. I don't think you need the second derivative, unless you want to determine the rate of change of the slope. You also aren't getting the second derivative. You're just getting the difference of the first derivative.

Also, you seem to be assigning arbitrary x values. If you're y-values represent points that are equally spaced apart, than it's ok, otherwise the derivative will be wrong.

Here's an example of how to get first and second der...



import numpy as np

x = np.linspace(1, 100, 1000)

y = np.cos(x)

# Find first derivative:
m = np.diff(y)/np.diff(x)

#Find second derivative
m2 = np.diff(m)/np.diff(x[:-1])

print(m)
print(m2)

# Get x-values where slope sign changes

c = len(m)

changes_index = []
for i in range(1, c):
    prev_val = m[i-1]
    val = m[i]
    if prev_val < 0 and val > 0:
        changes_index.append(i)
    elif prev_val > 0 and val < 0:
        changes_index.append(i)

for i in changes_index:
    print(x[i])


notice I had to curtail the x values for the second der. That's because np.diff() returns one less point than the original input.

alvrm
  • 311
  • 2
  • 9
  • Oh ok, yeah I definitely do not want the rate of change. I have question. I ran the code you provided but when I print out the last print statement 'print(x[i])' I am getting values like "1.5945945945945947 4.7657657657657655 7.936936936936937 11.00900900900901" but I was expecting whole integer numbers. I'm not sure I understand what happened. Also, thank you so much for your quick reply. Edit: Yes, my x values are equally spaced apart. My x values look like: 1, 2, 3, 4, 5, .... 1800. Second edit: I printed out my x values and realized those are just the x values. – Green-Chile-Data Apr 04 '22 at 21:30
  • 1
    Note that the derivative is negative in the beginning (near 5-10). It may be useful to smooth the function using a gaussian/convolution kernel. Another solution is to use a threshold (for both the derivative value and the number of items). – Jérôme Richard Apr 04 '22 at 21:36
  • @Green-Chile-Data ```numpy.linspace(1, 100, 1000)``` returns a set of 1000 values from 1 to 100. Therefore, there will be decimals. if you use ```numpy.linspace(1, 100, 100)``` it will be whole numbers. – alvrm Apr 04 '22 at 21:41
  • 1
    @Green-Chile-Data Please note the edit I made. the ```changes_index``` will need to be based on the slopes ```m``` , not the original y-values. – alvrm Apr 04 '22 at 22:23
2

A gradient approach is useless in this case because you don't care about velocities or vector fields. The knowledge of the gradient don't add extra information to locate the maximum value since the run are always positive hence will not effect the sign of the gradient. A method based entirly on raise is suggested.

Detect the indices for which the data are decreasing, find the difference between them and the location of the max value. Then by index manipulation you can find the value for which data has a maximum.

data = '0.8340502011561366 0.8423491600218922 0.8513456021654467 0.8458192388553084 0.8440111276014195 0.8489589671423143 0.8738088120491972 0.8845129900705279 0.8988298998926688 0.924633964692693 0.9544790734065157 0.9908034431246875 1.0236430466543138 1.061619773027915 1.1050038249835414 1.1371449802490126 1.1921182610371368 1.2752207659022576 1.344047620255176 1.4198117350668353 1.507943067143741 1.622137968203745 1.6814098429502085 1.7646810054280595 1.8485457435775694 1.919591124757554 1.9843144220593145 2.030158014640226 2.018184122476175 2.0323466012624207 2.0179200409023874 2.0316932950853723 2.013683870089898 2.03010703506514 2.0216151623726977 2.038855467786505 2.0453923522466093 2.03759031642753 2.019424996752278 2.0441806106428606 2.0607521369415136 2.059310067318373 2.0661157975162485 2.053216429539864 2.0715123971225564 2.0580473413362075 2.055814512721712 2.0808278560688964 2.0601637029377113 2.0539429365156003 2.0609648613513754 2.0585135712612646 2.087674625814453 2.062482961966647 2.066476100210777 2.0568444178944967 2.0587903943282266 2.0506399365756396'

data = data.split()
import numpy as np

a = np.array(data, dtype=float)

diff = np.diff(a)

neg_indeces = np.where(diff<0)[0]
neg_diff = np.diff(neg_indeces)

i_max_dif = np.where(neg_diff == neg_diff.max())[0][0] + 1

i_max = neg_indeces[i_max_dif] - 1 # because aise as a difference of two consecutive values

print(i_max, a[i_max])

Output

26 1.9843144220593145

Some details

print(neg_indeces) # all indeces of the negative values in the data
# [ 2  3 27 29 31 33 36 37 40 42 44 45 47 48 50 52 54 56]
print(neg_diff) # difference between such indices
# [ 1 24  2  2  2  3  1  3  2  2  1  2  1  2  2  2  2]
print(neg_diff.max()) # value with highest difference
# 24
print(i_max_dif) # location of the max index of neg_indeces -> 27
# 2
print(i_max) # index of the max of the origonal data
# 26
cards
  • 3,936
  • 1
  • 7
  • 25
  • I ended up using this code since it worked best with the data I had. Thank you so much. I'm not able to choose the best answer but I did upvote. – Green-Chile-Data Apr 05 '22 at 01:13