2

I have a dataset consisting of a certain temperature profile, and I wanna fit or map the measurement points on the temperature profile which is the following:

  • Dwell-time: 30 mins
  • Ramp-time: 1 min
  • Number of periods: 1000 cycles
  • Measure points period: 16 mins

Measure points can be happened in either high regime +150 or low regime -40

Note: The T_0 (initial time) is not clear, so the time reference is not clear e.g. T_0=0.

img

I already fetched the data in DataFrame:

import numpy as np
import pandas as pd
from scipy.optimize import curve_fit

df = pd.read_csv('D:\SOF.csv', header=None)
data = {'A': A[:,0], 'B': B[:,0], 'Temperature': Temperature[:,0],
        'S':S, 'C':C , 'Measurement_Points':MP}
dff = pd.DataFrame(data, columns=['A','B','Temperature','S','C','MP'], index = id_set[:,0])
# Temperature's range is [-40,+150]
# MP's range is [0-3000] from 1st MP till last one
MP = int(len(dff)/480) # calculate number of measurement points 
print(MP)
for cycle in range(MP):             
    j = cycle * 480
    #use mean or average of each 480 values from temperature column of DataFrame to pass for fit on Thermal profile
    Mean_temp = np.mean(df['Temperature'].iloc[j:j+480]) # by using Mean from numpy
    #Mean_temp = df.groupby('Temperature').mean() #by using groupby 

So far, I just found curve_fit from scipy.optimize based on this answer and this post but I am wondering how the fitting process could work right here; on the other hand, I would like Temperature values round only to the nearest either -40 or +150. It would be nice if someone could help me!

Update: Standard periodic Thermal profile pic is the following: img

Expected result: img

updated Data sample: data

Mario
  • 1,631
  • 2
  • 21
  • 51
  • Can you provide some data?! What is it actually that you want to fit. If `min(T)` and `max(T)` are clear, and so are dwell time and ramp, you only need to "fit" the time offset, right? – mikuszefski Mar 25 '19 at 10:32
  • Are you sure there are no points on the ramp? The lower valley is 30 min as well, making a period 62 minutes? – mikuszefski Mar 25 '19 at 10:34
  • @mikuszefski exactly those features you mentioned in your first comment are clear and I just need to fit "Measurement points" on Thermal profile as you can seen like red star for High temperature and blue star in Low temperature based on data and the point is I don't have directly time in my dataset instead I have number of measurement points which each takes 16 mins so I can calculate easily total time by [Number of MP *((2*30)+(3*1))]. Regarding your 2nd comment each ramps takes **1min** so each period would be **63mins** not 62mins according to standard Thermal profile I updated the pic. – Mario Mar 25 '19 at 17:19
  • @mikuszefski Also Regarding your second comment as I mentioned in note since T0 is not clear and there is some low possibility that some measurements happened in ramp duration ! and I want to plot them so that i can follow them or count them for example how often happens? I was wondering to have some kind of a formula or/and a graph which best describes the data I measured. – Mario Mar 25 '19 at 17:29
  • Doesn't seem to difficult. Any tries from your side so far? Would be a good idea to post those as well. P.S. Your first image suggest 62. (30 low +1 up + 30 high + 1 down...repeat) – mikuszefski Mar 26 '19 at 07:47
  • I've just updated my tried code which is nothing since I'm unfamiliar with `fit_curve` and not sure how I can pass the **mean** of Temperature column from `DaraFrame` to fitting process that's why I asked this question. As standard thermal profile represents time period would be 63 mins (1up + 30High + 1down + 30low + 1up) nevertheless you're right 1st pic was unclear in term of periodic time. – Mario Mar 26 '19 at 16:22
  • ...but then you have 1up twice: in the beginning and at the end of the period. That does not make sense, does it? – mikuszefski Mar 26 '19 at 16:35
  • @mikuszefski our cycle begins at one point and ends at the same point based on what I know like eg. Sin begins at 0 and ends at 0 again. Based on on our profile we begins at the High point and we end at the High point nevertheless your point of view is also correct we can assume like that as well however it doesn't matter. – Mario Mar 26 '19 at 17:19
  • Ok, lets forget about periodicity then. Can you once again write what information you actually want to subtract. Is it just the two temperature levels? – mikuszefski Mar 27 '19 at 07:56
  • @mikuszefski No I'm interested in mapping MPs on Thermal profile by like your first answer so that I can get **pattern** by `fit_curve` like you provided for 1st answer in the end by printing. Actually I'm looking forward to extracting pattern so that in case that I have some **missing data** in my dataset I can fix them by replacing High or Low values ! – Mario Mar 27 '19 at 18:20
  • @mikuszefski So To make it easy I updated the Data sample and show expected result in the end of my post. I liked the way you find some non-plateau by using gaussian distribution function but In the end I would like to have some kind of a formula or/and a graph which best describes the data I measured. I would like to observe how often MPs happen in ramp-time which takes 1 min. PS: in your second answer I get value error `inData= np.loadtxt()` could not convert string to float: – Mario Mar 27 '19 at 18:26
  • Ok. for me it works with both, SOF.csv and SOF2.csv. I guess you need to debug that yourself. I was playing with your data and it doesn't fit well if I assume equidistant measurements. Your (handmade?) expectation image already suggests that this is not going to work as you have chosen a variable distance as well. You do not have a time stamp? How sure are you about the 16 minutes between measurements? – mikuszefski Mar 28 '19 at 08:57
  • I don't have time stamp directly but considering each MP takes 16 mins I can find number of measurement points by `MP = int(len(df['Temperature'])/480)` and multiply it by 16 mins. Man this measurements have done by Raspberry Pi and thermal shock has done by air thermal chamber.If you look at SOF it contains 11 MP (0-10) and I just modified SOF2 to make it periodic during those measurements and left one measurement which is 8th MP(MP=9) around 0 so that we can map or fit it in ramp-time and I'm looking for extracting pattern for MPs in High & Low and especially during ramp-time which is 1min. – Mario Mar 28 '19 at 15:03
  • Well, I have been fitting data for a long time now. And yes, I tried the ` x * 16` for the positions in time. And no, it doesn't work. There is no reasonable ramp 1 period 63 solution to that data with the assumed time steps. I can post what I tried, but I am not sure if it helps you. Highly recommended: next time make a column for time as well. – mikuszefski Mar 29 '19 at 06:36
  • @mikuszefski man I would like appreciate you for trying to fit and I also conclude that without initial time it's not scientific su cha fit and find equation but at least is there any nice way to **map** the measurements points on high and low regim on thermal profile and map non-plateau data on ramp like you implement in your first answer. My aim is to visually control how many measurement points happen in average in high and low and how often it happens in ramp time if we assume 1st measurement point happened in t0=0. It would be nice if call and pass data via `Pandas.DataFrame`. Thanks – Mario Apr 02 '19 at 08:36
  • @mikuszefski Thanks for leaving your codes in **Edit** part. I just had two problems: Idk why the first picture is shown **blank** for me but I tried to at least plot it via `for x,y in zip(inData[:,5], inData[:,2]):` to have X-axis based on** MP numbers** which is 6th column in SOF2.csv from [0-10] totally 11 MPs. Another issue I would like to mark temperature around 150 and around -40 by blue color and rest between like [0,100] by red color via combination of `x`&`o` like 2nd picture of 1st answer but I didn't understand how did you do it, are you plotting 2 times? – Mario Apr 02 '19 at 21:25
  • @mikuszefski May I ask also shortly what are `def partition` and `def chi2` for? – Mario Apr 02 '19 at 21:27
  • Hi, for the first part, I can't help without having detailed code on how you are plotting what. For the second question: yes. see my code for the multiple plot commands. Finally, partition also could have been `numpy.reshape( 11, 480 )`. It is, hence, only to split the data into blocks of 480 measurements, i.e. your MPs. I also introduced my own chi square function to use the `scipy.optimize.minimize`. – mikuszefski Apr 03 '19 at 05:54
  • @mikuszefski Hi , I was wondering if you have nice idea regarding this [question](https://stackoverflow.com/questions/55639267/how-can-display-differences-of-two-matrices-by-subtraction-via-heatmap-in-python) . Have a nice weekend – Mario Apr 11 '19 at 21:14
  • Well, I have plenty, but I'd need more data and especially more knowledge about its acquisition. The main problem/question remains: how sure are you that the measurement period is reliable over the full measurement time. Even in the small data set you gave me, the data is incompatible with this assumption. If by any means possible, repeat the measurement and put a time stamp. – mikuszefski Apr 12 '19 at 07:02
  • @mikuszefski sure thing I will repeat the measurements and let you know. Kindly I wanted to ask you for having a look at this [question](https://stackoverflow.com/questions/55639267/how-can-display-differences-of-two-matrices-by-subtraction-via-heatmap-in-python) if you had free time. thanks – Mario Apr 14 '19 at 10:32
  • @mikuszefski Hi dear may I ask you to have a look to this [question](https://stackoverflow.com/questions/57233218/how-can-color-up-or-highlighted-outliers-in-scatter-plot-by-using-customized-ran) urgently. – Mario Jul 28 '19 at 13:06
  • @mikuszefski Would you do me a favor and have a look to this urgent [question](https://stackoverflow.com/questions/57741001/how-can-optmize-fitting-data-on-thermal-profile-properly) – Mario Sep 01 '19 at 12:44

2 Answers2

2

This would be my starting point:

import matplotlib.pyplot as plt
import numpy as np
from scipy.optimize import curve_fit

### to generate test data
def temp( t , low, high, period, ramp ):
    tRed = t % period
    dwell = period / 2. - ramp
    if tRed < dwell:
        out = high
    elif tRed < dwell + ramp:
        out = high - ( tRed - dwell ) / ramp * ( high - low )
    elif tRed < 2 * dwell + ramp:
        out = low
    elif tRed <= period:
        out = low + ( tRed - 2 * dwell - ramp)/ramp * ( high -low )
    else:
        assert 0
    return out + np.random.normal() 

### A continuous function that somewhat fits the data
### but definitively gets the period and levels. 
### The ramp is less well defined
def fit_func( t, low, high, period, s,  delta):
    return  ( high + low ) / 2. + ( high - low )/2. * np.tanh( s * np.sin( 2 * np.pi * ( t - delta ) / period ) )



time1List = np.arange( 300 ) * 16
time2List = np.linspace( 0, 300 * 16, 7213 )
tempList = np.fromiter( ( temp(t - 6.3 , 41, 155, 63.3, 2.05 ) for t in time1List ), np.float )
funcList = np.fromiter( ( fit_func(t , 41, 155, 63.3, 10., 0 ) for t in time2List ), np.float )

sol, err = curve_fit( fit_func, time1List, tempList, [ 40, 150, 63, 10, 0 ] )
print sol

fittedLow, fittedHigh, fittedPeriod, fittedS, fittedOff = sol
realHigh = fit_func( fittedPeriod / 4., *sol)
realLow = fit_func( 3 / 4. * fittedPeriod, *sol)
print "high, low : ", [ realHigh, realLow ]
print "apprx ramp: ", fittedPeriod/( 2 * np.pi * fittedS ) * 2

realAmp = realHigh - realLow
rampX, rampY = zip( *[ [ t, d ] for t, d in zip( time1List, tempList ) if ( ( d < realHigh - 0.05 * realAmp ) and ( d > realLow + 0.05 * realAmp ) ) ] )
topX, topY = zip( *[ [ t, d ] for t, d in zip( time1List, tempList ) if ( ( d > realHigh - 0.05 * realAmp ) ) ] )
botX, botY = zip( *[ [ t, d ] for t, d in zip( time1List, tempList ) if ( ( d < realLow + 0.05 * realAmp ) ) ] )

fig = plt.figure()
ax = fig.add_subplot( 2, 1, 1 )
bx = fig.add_subplot( 2, 1, 2 )

ax.plot( time1List, tempList, marker='x', linestyle='', zorder=100 )
ax.plot( time2List, fit_func( time2List, *sol ), zorder=0 )

bx.plot( time1List, tempList, marker='x', linestyle='' )
bx.plot( time2List, fit_func( time2List, *sol ) )
bx.plot( rampX, rampY, linestyle='', marker='o', markersize=10, fillstyle='none', color='r')
bx.plot( topX, topY, linestyle='', marker='o', markersize=10, fillstyle='none', color='#00FFAA')
bx.plot( botX, botY, linestyle='', marker='o', markersize=10, fillstyle='none', color='#80DD00')
bx.set_xlim( [ 0, 800 ] )
plt.show()

providing:

>> [155.0445024   40.7417905   63.29983807  13.07677546 -26.36945489]
>> high, low :  [155.04450237880076, 40.741790521444436]
>> apprx ramp:  1.540820542195840

Test data and fit + zoom

There is a few things to note. My fit function works better if the ramp is small compared to the dwell time. Moreover, one will find several posts here where the fitting of step functions is discussed. In general, as fitting requires a meaningful derivative, discrete functions are a problem. There are at least two solutions. a) make a continuous version, fit, and make the result discrete to your liking or b) provide a discrete function and a manual continuous derivative.

EDIT

So here is what I get working with your newly posted data set:

import matplotlib.pyplot as plt
import numpy as np
from scipy.optimize import curve_fit, minimize

def partition( inList, n ):
    return zip( *[ iter( inList ) ] * n )

def temp( t, low, high, period, ramp, off ):
    tRed = (t - off ) % period
    dwell = period / 2. - ramp
    if tRed < dwell:
        out = high
    elif tRed < dwell + ramp:
        out = high - ( tRed - dwell ) / ramp * ( high - low )
    elif tRed < 2 * dwell + ramp:
        out = low
    elif tRed <= period:
        out = low + ( tRed - 2 * dwell - ramp)/ramp * ( high -low )
    else:
        assert 0
    return out

def chi2( params, xData=None, yData=None, verbose=False ):
    low, high, period, ramp, off = params
    th = np.fromiter( ( temp( t, low, high, period, ramp, off ) for t in xData ), np.float )
    diff = ( th - yData )
    diff2 = diff**2
    out = np.sum( diff2 )
    if verbose:
        print '-----------'
        print th
        print diff
        print diff2
        print '-----------'
    return out
    # ~ return th

def fit_func( t, low, high, period, s,  delta):
    return  ( high + low ) / 2. + ( high - low )/2. * np.tanh( s * np.sin( 2 * np.pi * ( t - delta ) / period ) )


inData = np.loadtxt('SOF2.csv', skiprows=1, delimiter=',' )
inData2 = inData[ :, 2 ]
xList = np.arange( len(inData2) )
inData480 = partition( inData2, 480 )
xList480 = partition( xList, 480 )
inDataMean = np.fromiter( (np.mean( x ) for x in inData480 ), np.float )
xMean = np.arange( len( inDataMean) ) * 16
time1List = np.linspace( 0, 16 * len(inDataMean), 500 )

sol, err = curve_fit( fit_func, xMean, inDataMean, [ -40, 150, 60, 10, 10 ] )
print sol

# ~ print chi2([-49,155,62.5,1 , 8.6], xMean, inDataMean )
res = minimize( chi2, [-44.12, 150.0, 62.0,  8.015,  12.3 ], args=( xMean, inDataMean ), method='nelder-mead' )
# ~ print res
print res.x

# ~ print chi2( res.x, xMean, inDataMean, verbose=True )
# ~ print chi2( [-44.12, 150.0, 62.0,  8.015,  6.3], xMean, inDataMean, verbose=True )

fig = plt.figure()
ax = fig.add_subplot( 2, 1, 1 )
bx = fig.add_subplot( 2, 1, 2 )

for x,y in zip( xList480, inData480):
    ax.plot( x, y, marker='x', linestyle='', zorder=100 )

bx.plot( xMean, inDataMean , marker='x', linestyle='' )
bx.plot( time1List, fit_func( time1List, *sol ) )

bx.plot( time1List, np.fromiter( ( temp( t , *res.x ) for t in time1List ), np.float) )
bx.plot( time1List, np.fromiter( ( temp( t , -44.12, 150.0, 62.0,  8.015,  12.3 ) for t in time1List ), np.float) )

plt.show()

new fit

>> [-49.53569904 166.92138068  62.56131027   1.8547409    8.75673747]
>> [-34.12188737 150.02194584  63.81464913   8.26491754  13.88344623]

As you can see, the data point on the ramp does not fit in. So, it might be that the 16 min time is not that constant? That would be a problem as this is not a local x-error but an accumulating effect.

mikuszefski
  • 3,943
  • 1
  • 25
  • 38
  • Firstly thanks for your try and you're right about that it would be better when ramp-time < dwell-time. Honestly considering I provided sample data I would expect that to see real data-sample fit and how you pass them `curve_fit`. Due to there are 480 values for each measurement point we could calculate `mean` of every 480 can be considered as right temperature and pass them from `DataFrame` to fit by `for-loop` : `np.mean(df[' '].iloc[j:j+480])`or `df.groupby(' ').mean()` based on structure of data-sample. I was also wondering if we could exclude Thermal profile features for further changes. – Mario Mar 26 '19 at 16:13
  • @Mario Sorry, but your sample data contains just a single jump, so there is no period to fit. Taking averages shouldn't be a problem unless the time taking the samples is in the range of ramping time, in which case you might hide important information. I don't get the "Thermal profile"-thing, though. What do you mean? – mikuszefski Mar 26 '19 at 16:33
  • @Mario ...but then it is not the period that you need to know, right? What is it that you actually want to know? – mikuszefski Mar 26 '19 at 16:43
  • I completely agree with you and you already programmed the thermal profile but it's based on the data you generated . my question is how can I pass the data from `DataFrame` to the `curve_fit` so that we can extract a pattern from raw data. I noticed that you've defined some values like `fit_func` `time1List` `tempList` `[ -40(Low), +150(High), 63(Period), 10(?), 0(?) ]` manually however I want the function to extract these values from my dataset. Thanks for your consideration in details. – Mario Mar 26 '19 at 18:16
  • @Mario The thermal profile and especially the fit function I use are non-linear functions that I am going to fit to same generic data with noise. In principle `curve_fit` can just find the parameters, but can get stuck in local minima or even not converge. One can avoid this by giving reasonable start values for the parameters. That is what I have done. The `10` is a parameter describing the flatness of the plateau and the slope of the ramp. So it is not so important. The `0` is just the starting guess for the time offset, chosen wrong on purpose. – mikuszefski Mar 27 '19 at 07:51
  • Hi, after a long time still your fitting solution is so vague for me with excellent output. Man, may I kindly ask you to let me know how can increase the measurement points to 3-4 in high and low thermal regimes in the thermal profile with the orange color which are highlighted with 'x' inside of 'o' perfectly? Which parameter corresponds to that? – Mario Aug 29 '19 at 16:50
  • @Mario I'd like to help, but have to say that I don't understand your question. Can you be more precise? (also....I have to read everything from the beginning again to see what this was all about) Cheers. – mikuszefski Sep 03 '19 at 09:24
2

If you are only interested in the two temperature levels, this might be useful:

import matplotlib.pyplot as plt
import numpy as np
from scipy.optimize import curve_fit

inData = np.loadtxt('SOF.csv', skiprows=1, delimiter=',' )

def gauss( x, s ):
    return 1. / np.sqrt( 2. * np.pi * s**2 ) * np.exp( -x**2 / ( 2. * s**2 ) )

def two_peak( x , a1, mu1, s1, a2, mu2, s2 ):
    return a1 * gauss( x - mu1, s1 ) + a2 * gauss( x - mu2, s2 )

fList = inData[ :, 2 ]

nBins = 2 * int( max( fList ) - min( fList ) )
fig = plt.figure()

ax = fig.add_subplot( 2, 1 , 1 )
ax.plot( fList , marker='x' )
bx = fig.add_subplot( 2, 1 , 2 )
histogram, binEdges, _ = bx.hist( fList, bins=nBins )

binCentre = np.fromiter( (  ( a + b ) / 2. for a,b in zip( binEdges[ 1: ], binEdges[ :-1 ] ) ) , np.float )
sol, err = curve_fit( two_peak, binCentre, histogram, [ 120, min( fList ), 1 ] + [ 500, max( fList ), 1 ] )
print sol[1], sol[4]
print sol[2], sol[5]
bx.plot( binCentre, two_peak( binCentre, *sol ) )
bx.set_yscale( 'log' )
bx.set_ylim( [ 1e-0, 5e3] )
plt.show()

providing:

>> -46.01513424923528 150.06381412858244
>> 1.8737971845243133 0.6964990809008554

and

Histogram fit

Interestingly your non-plateau data is all around zero, so that's probably not due to the ramp, but a different effect.

mikuszefski
  • 3,943
  • 1
  • 25
  • 38