0

I have several sets of data to which I'm trying to fit different profiles. In the centre of one of the minima there is contamination that prevents me from doing a good fit as you can see in this image:fits of the profiles

How can I clip out those spikes in the bottom of my data taking into account that the spike is not always in the same position? Or how would you deal with data like this? I'm using lmfit to fit the profiles, in this case a Lorentzian and a Gaussian. Here is a minimal working example where I have played with the initial values to fit the data more closely:

import numpy as np
import matplotlib.pyplot as plt
from lmfit import Model
from lmfit.models import GaussianModel, ConstantModel, LorentzianModel

x = np.array([4085.18084467,  4085.38084374,  4085.5808428 , 4085.78084186, 4085.98084092,  4086.18083999,  4086.38083905,  4086.58083811, 4086.78083717,  4086.98083623,  4087.1808353 ,  4087.38083436, 4087.58083342,  4087.78083248,  4087.98083155,  4088.18083061, 4088.38082967,  4088.58082873,  4088.78082779,  4088.98082686, 4089.18082592,  4089.38082498,  4089.58082404,  4089.78082311, 4089.98082217,  4090.18082123,  4090.38082029,  4090.58081935, 4090.78081842,  4090.98081748,  4091.18081654,  4091.3808156 , 4091.58081466,  4091.78081373,  4091.98081279,  4092.18081185, 4092.38081091,  4092.58080998,  4092.78080904,  4092.9808081 , 4093.18080716,  4093.38080622,  4093.58080529,  4093.78080435, 4093.98080341,  4094.18080247,  4094.38080154,  4094.5808006 , 4094.78079966,  4094.98079872,  4095.18079778,  4095.38079685, 4095.58079591,  4095.78079497,  4095.98079403,  4096.1807931 , 4096.38079216,  4096.58079122,  4096.78079028,  4096.98078934, 4097.18078841,  4097.38078747,  4097.58078653,  4097.78078559,4097.98078466,  4098.18078372,  4098.38078278,  4098.58078184, 4098.7807809 ,  4098.98077997,  4099.18077903,  4099.38077809, 4099.58077715,  4099.78077622,  4099.98077528,  4100.18077434, 4100.3807734 ,  4100.58077246,  4100.78077153,  4100.98077059, 4101.18076965,  4101.38076871,  4101.58076778,  4101.78076684, 4101.9807659 ,  4102.18076496,  4102.38076402,  4102.58076309, 4102.78076215,  4102.98076121,  4103.18076027,  4103.38075934, 4103.5807584 ,  4103.78075746,  4103.98075652,  4104.18075558, 4104.38075465,  4104.58075371,  4104.78075277,  4104.98075183, 4105.1807509 ,  4105.38074996,  4105.58074902,  4105.78074808, 4105.98074714,  4106.18074621,  4106.38074527,  4106.58074433, 4106.78074339,  4106.98074246,  4107.18074152,  4107.38074058, 4107.58073964,  4107.7807387 ,  4107.98073777,  4108.18073683, 4108.38073589,  4108.58073495,  4108.78073401,  4108.98073308, 4109.18073214,  4109.3807312 ,  4109.58073026,  4109.78072933, 4109.98072839,  4110.18072745,  4110.38072651,  4110.58072557, 4110.78072464,  4110.9807237 ,  4111.18072276,  4111.38072182, 4111.58072089,  4111.78071995,  4111.98071901,  4112.18071807, 4112.38071713,  4112.5807162 ,  4112.78071526,  4112.98071432, 4113.18071338,  4113.38071245,  4113.58071151,  4113.78071057, 4113.98070963,  4114.18070869,  4114.38070776,  4114.58070682, 4114.78070588,  4114.98070494,  4115.18070401,  4115.38070307, 4115.58070213,  4115.78070119,  4115.98070025,  4116.18069932, 4116.38069838,  4116.58069744,  4116.7806965 ,  4116.98069557, 4117.18069463,  4117.38069369,  4117.58069275,  4117.78069181, 4117.98069088,  4118.18068994,  4118.380689  ,  4118.58068806, 4118.78068713,  4118.98068619,  4119.18068525,  4119.38068431, 4119.58068337,  4119.78068244,  4119.9806815 ,  4120.18068056, 4120.38067962,  4120.58067869,  4120.78067775,  4120.98067681, 4121.18067587,  4121.38067493,  4121.580674  ,  4121.78067306, 4121.98067212,  4122.18067118,  4122.38067025,  4122.58066931, 4122.78066837,  4122.98066743,  4123.18066649,  4123.38066556, 4123.58066462,  4123.78066368,  4123.98066274,  4124.1806618 , 4124.38066087,  4124.58065993,  4124.78065899,  4124.98065805, 4125.18065712,  4125.38065618,  4125.58065524,  4125.7806543 , 4125.98065336,  4126.18065243,  4126.38065149,  4126.58065055, 4126.78064961,  4126.98064868,  4127.18064774,  4127.3806468 , 4127.58064586,  4127.78064492,  4127.98064399,  4128.18064305, 4128.38064211,  4128.58064117,  4128.78064024,  4128.9806393 , 4129.18063836,  4129.38063742,  4129.58063648,  4129.78063555, 4129.98063461,  4130.18063367,  4130.38063273,  4130.5806318 , 4130.78063086,  4130.98062992,  4131.18062898,  4131.38062804, 4131.58062711,  4131.78062617,  4131.98062523,  4132.18062429, 4132.38062336,  4132.58062242,  4132.78062148,  4132.98062054, 4133.1806196 ,  4133.38061867,  4133.58061773,  4133.78061679, 4133.98061585,  4134.18061492,  4134.38061398,  4134.58061304, 4134.7806121 ,  4134.98061116])
y = np.array([0.90312759,  1.00923175,  0.94618369,  0.98284045,  0.91510612,        0.96737804,  0.97690214,  0.94363369,  1.00887784,  1.00110387,        0.91647096,  0.97943202,  1.00672907,  1.01552094,  1.01089407,        0.96914584,  0.9908419 ,  1.0176613 ,  0.97032148,  0.96003562,        0.9702355 ,  0.93684173,  0.94652734,  0.94895018,  1.01214356,        0.85777678,  0.89308203,  0.9789272 ,  0.93901884,  0.9684622 ,        0.96969321,  0.86326307,  0.89607392,  0.92459571,  1.00454429,        1.06019733,  0.97291196,  0.95646497,  0.95899707,  1.02830351,        0.94938178,  0.91481128,  0.92606219,  0.97085631,  0.93597434,        0.91316857,  0.90644542,  0.91726926,  0.91686184,  0.96445563,        0.92166362,  0.95831572,  0.93859066,  0.85285273,  0.89944073,        0.91812428,  0.94265677,  0.88281406,  0.9470601 ,  0.94921529,        0.97289222,  0.94632251,  0.96633195,  0.94096512,  0.95324803,        0.90920845,  0.92100257,  0.91181745,  0.95715298,  0.91715382,        0.90219214,  0.87585035,  0.86592191,  0.89335902,  0.85536392,        0.89619274,  0.9450366 ,  0.82780137,  0.81214176,  0.83461329,        0.82858317,  0.80851704,  0.79253546,  0.85440086,  0.81679169,        0.80579976,  0.72312218,  0.75583125,  0.75204599,  0.84519188,        0.68686821,  0.71472154,  0.71706318,  0.72640234,  0.70526356,        0.68295282,  0.66795774,  0.65004383,  0.68096834,  0.72697547,        0.72436393,  0.77128385,  0.79666758,  0.67349101,  0.61479406,        0.57046337,  0.51614312,  0.52945366,  0.53112169,  0.53757761,        0.56680358,  0.63839684,  0.60704329,  0.62377533,  0.67862515,        0.64587581,  0.71316115,  0.76309798,  0.72217569,  0.7477785 ,        0.79731849,  0.76934137,  0.77063868,  0.77871584,  0.77688526,        0.84342722,  0.85382332,  0.88700466,  0.85837992,  0.79589266,        0.83798993,  0.79835529,  0.84612746,  0.83214907,  0.86373676,        0.90729115,  0.82111605,  0.86165685,  0.84090099,  0.90389133,        0.89554032,  0.90792356,  0.92798016,  0.95588479,  0.95019718,        0.95447497,  0.89845759,  0.91638311,  0.99263342,  0.97477606,        0.95482538,  0.94489498,  0.94344967,  0.90526465,  0.92538486,        0.96279787,  0.94005143,  0.96842454,  0.92296494,  0.89954172,        0.8684367 ,  0.95039002,  0.95229769,  0.93752274,  0.94741173,        0.96704449,  1.01130839,  0.95499414,  0.99596569,  0.95130622,        1.00014723,  1.00252218,  0.95130331,  1.0022896 ,  0.99851989,        0.94405282,  0.95814021,  0.94851972,  1.01302067,  1.01400272,        0.97960083,  0.97070283,  1.01312797,  0.9842154 ,  1.01147273,       0.97331853,  0.91403182,  0.96813051,  0.92319169,  0.9294103 ,        0.96960715,  0.94811518,  0.97115083,  0.84687543,  0.90725159,        0.88061293,  0.87319615,  0.85331661,  0.89775082,  0.90956716,        0.83174505,  0.89753388,  0.89554364,  0.95329739,  0.87687031,        0.93883127,  0.97433899,  0.99515225,  0.97519981,  0.91956466,        0.97977674,  0.93582089,  1.00662722,  0.90157277,  1.02887754,        0.9777419 ,  0.94257094,  1.02359615,  0.98968414,  1.00075502,        1.03230265,  1.05904074,  1.00488442,  1.05507886,  1.05085518,        1.02561781,  1.05896008,  0.98024381,  1.08005691,  0.94528977,        1.03853637,  1.02064405,  1.0467137 ,  1.05375156,  1.12907949,        0.99295611,  1.06601022,  1.02846374,  0.98006807,  0.96446772,        0.97702428,  0.97788589,  0.93889781,  0.96366778,  0.96645265,        0.95857242,  1.05796304,  0.99441763,  1.00573183,  1.05001927])
e = np.array([0.0647344 ,  0.04583914,  0.05665552,  0.04447208,  0.05644753,        0.03968611,  0.05985188,  0.04252311,  0.03366922,  0.04237672,        0.03765898,  0.03290132,  0.04626836,  0.05106203,  0.03619188,        0.03944098,  0.08115469,  0.05859644,  0.06091101,  0.05170821,        0.0427244 ,  0.06804469,  0.06708318,  0.03369381,  0.04160575,        0.08007032,  0.09292148,  0.04378329,  0.08216214,  0.06087074,        0.05375458,  0.06185891,  0.06385766,  0.08084546,  0.04864063,        0.06400878,  0.04988693,  0.06689165,  0.05989534,  0.08010138,        0.0681177 ,  0.04478208,  0.03876582,  0.05977015,  0.06610619,        0.05020086,  0.07244604,  0.0445143 ,  0.06970626,  0.04423994,        0.0414573 ,  0.06892836,  0.05715395,  0.04014724,  0.07908425,        0.06082051,  0.08380691,  0.08576757,  0.06571406,  0.04842625,        0.05298355,  0.05271857,  0.06340425,  0.10849621,  0.0811072 ,        0.03642638,  0.10614094,  0.09865099,  0.06711037,  0.10244762,        0.11843505,  0.1092357 ,  0.09748241,  0.09657009,  0.09970179,        0.10203563,  0.18494082,  0.14097796,  0.1151294 ,  0.16172895,        0.17611204,  0.16226913,  0.2295418 ,  0.17795924,  0.1253298 ,        0.1771586 ,  0.15139061,  0.14739618,  0.1620105 ,  0.19158538,        0.21431605,  0.19292715,  0.23308884,  0.30519423,  0.31401994,        0.30569885,  0.31216375,  0.35147676,  0.25016472,  0.16232236,        0.09058787,  0.0604483 ,  0.05168302,  0.21432774,  0.38149791,        0.5061975 ,  0.44281541,  0.50646427,  0.43761581,  0.44989111,        0.47778238,  0.39944325,  0.32462726,  0.34560857,  0.3175776 ,        0.30253441,  0.23059451,  0.24516185,  0.20708065,  0.26429751,        0.1830661 ,  0.15155041,  0.16497299,  0.15794139,  0.13626666,        0.17839823,  0.13502886,  0.14148522,  0.10869864,  0.11723602,        0.09074029,  0.06922157,  0.07719777,  0.13181317,  0.11441895,        0.10655855,  0.12073767,  0.0846133 ,  0.07974657,  0.06538693,        0.0573741 ,  0.07864047,  0.08351471,  0.08130351,  0.0768824 ,        0.07951992,  0.04478989,  0.0765122 ,  0.04842814,  0.04355571,        0.05138656,  0.07215294,  0.04681987,  0.05790133,  0.06163808,        0.082449  ,  0.06127927,  0.04971221,  0.05107901,  0.04493687,        0.06072161,  0.06094332,  0.03630467,  0.04162285,  0.04058228,        0.04526251,  0.06191432,  0.04901982,  0.0454908 ,  0.06186274,        0.0407017 ,  0.03865571,  0.04353665,  0.03898987,  0.04666321,        0.05856035,  0.04225933,  0.04797901,  0.03523971,  0.04728414,        0.05494382,  0.04773011,  0.03210954,  0.05651663,  0.03625933,        0.03596701,  0.03800191,  0.06267668,  0.06431192,  0.0602614 ,        0.05139896,  0.04571979,  0.04375182,  0.0576867 ,  0.07491418,        0.05339972,  0.07619115,  0.11569378,  0.07087871,  0.09076518,        0.13554717,  0.07811761,  0.07180695,  0.05831886,  0.06042863,        0.08759576,  0.06650081,  0.08420164,  0.08185432,  0.04338836,        0.04970979,  0.04008252,  0.03605485,  0.03456321,  0.05594584,        0.03856822,  0.03576337,  0.03118799,  0.0441686 ,  0.0469118 ,        0.03591666,  0.03562582,  0.04934832,  0.03280972,  0.03201576,        0.04338048,  0.07443531,  0.04121059,  0.03774147,  0.03717577,        0.03354207,  0.03806978,  0.0319364 ,  0.03715712,  0.0379478 ,        0.04867626,  0.0304592 ,  0.03393844,  0.034518  ,  0.04293514,        0.05177898,  0.05332907,  0.0352937 ,  0.03359781,  0.04625272,        0.03733088,  0.03501259,  0.03346308,  0.04333749,  0.05741173])

cont = ConstantModel(prefix='cte_')
pars = cont.guess(y, x=x)

gauss = GaussianModel(prefix='g_')
pars.update( gauss.make_params())    
pars['cte_c'].set(1)
pars['g_center'].set(4125, min=4120, max=4130)
pars['g_sigma'].set(1, min=0.5)
pars['g_amplitude'].set(-0.2, min=-0.5)

loren = LorentzianModel(prefix='l_')
pars.update( loren.make_params())    
pars['l_center'].set(4106, min=4095, max=4115)
pars['l_sigma'].set(4, max=6)
pars['l_amplitude'].set(-6., max=-4.)

model = gauss + loren + cont

init = model.eval(pars, x=x)
result = model.fit(y, pars, x=x, weights=1/e)

#print(result.fit_report(min_correl=0.5))

fig, ax = plt.subplots(figsize=(8,6))

ax.plot(x, y, 'k-', lw=2) # data in red
ax.plot(x, init, 'g--', lw=2) # initial guess 
ax.plot(x, result.best_fit, 'r-', lw=2) # best fit
ax.set(xlim=(4085,4135), ylim=(0.4,1.14))
EternalGenin
  • 495
  • 2
  • 6
  • 14

1 Answers1

0

If the bad point is always at the same x value, you could remove that point from the data, perhaps with something like:

import numpy as np
def index_nearest(array, value):
    """index of array nearest to value"""
    return np.abs(array-value).argmin()

ybad = index_nearest(x, 4150)
y[ybad] = x[ybad] = np.nan
x = x[np.where(np.isfinite(y))]
y = y[np.where(np.isfinite(y))]

and then fit your model to those data with the bad point removed.

But, also: if there is not an obviously errant point and the data "just" noisy, there is probably no advantage to removing what looks like bad points. Your data looks noisy to me, but it's hard to see that there is a systematically bad point. If you are going to remove a point, remember that you are asserting that this measurement was not merely affected by normal noise, but was wrong.

Finally: another approach to treating noisy data might be to try to smooth the data, say with a Savitzky-Golay filter. There is always some danger of smoothing out features with such an approach, but a modest S-G filter is often good for cleaning up noisy data enough to detect features. Of course, if fits to filtered data give significantly different results from fits to unfiltered data, you will probably need to understand why that is.

M Newville
  • 7,486
  • 2
  • 16
  • 29
  • Thanks for your answer. Well the data is noisy. But the major problem is that the contamination in the core of the larger line (the one I'm fitting the lorentzian profile) is present in all cases, is not just noise and varies its position on the x-axis, so if I just remove a large portion of the data to clip out the feature, I'm losing a lot of information on the shape of the profile. Also, in this case, I cannot smooth the data, but as I said, the noise is not the problem, but the contamination in the centre of the main minimum is what is causing the problem. – EternalGenin Sep 15 '17 at 11:18
  • You *can* clip-out points, which is what you actually asked how to do (and which I showed, I think). If some x-value is always bad, you can remove it. And if it *is* bad, you would not be removing data you care about. But, I'm not sure I see an obviously bad point in your data -- it looks too noisy to be sure that the variations in the largest peak are *not* random. Also, though a little hard to tell from the plots, it appears it may not always be at the same X position. I would encourage you to do a careful investigation of whether there really is a bad point if you haven't already. – M Newville Sep 16 '17 at 00:52