First of all I would like to say that I do have conceptual difficulties with this topic, so please correct me if my intention does not make sense.
I'm trying to validate the model that I use to extract the signal yield from a distribution. For simplicity, assume there is only a signal distribution and no background. The model is a standard Gauss and I created an extended PDF.
In my thinking I would create sample data from that PDF and only change the number of events, i.e. only the scaling. Than I would fit this toy sample and compare the generated number of events to the fitted signal yield by computing
pull = (N_generated - N_fit) / sigma_Nfit
What I don't understand is how I set and get the generated number. In the generation, I want to set the number of events randomly (I guess Poisson distributed?) and keep all other model parameters fixed.
In the end I have signal+background distributions and want to:
- vary number of signal + keep number of background fixed
- fix signal + vary background
- vary fraction in n_total = n_signal + fraction * n_background
The following code is taken from the zfit-tutorials and updated to have an extended model (also found here: https://github.com/holyschmoly/zfit_toymc):
mu = zfit.Parameter('mu', 0, -1, 1)
sigma = zfit.Parameter('sigma', 1, 0.5, 1.5)
model = zfit.pdf.Gauss(obs=obs, mu=mu, sigma=sigma)
# do i need this range here?
nsig_toy = 1e4
nsig_range = 0.1
n_sig = zfit.Parameter('n_sig', nsig_toy, nsig_toy*(1-nsig_range), nsig_toy*(1+nsig_range))
model = model.create_extended(n_sig)
# vary only n_sig
sampler = model.create_sampler(fixed_params=[mu, sigma])
nll = zfit.loss.ExtendedUnbinnedNLL(model, sampler)
minimizer = zfit.minimize.Minuit(
strategy=zfit.minimize.DefaultToyStrategy(),
verbosity=0,
tolerance=1e-3,
use_minuit_grad=True)
params = nll.get_params()
fit_results = []
ntoys = 10
while len(fit_results) < ntoys:
# What does really happen here?
sampler.resample()
# I think this is probably not what I want but I don't understand why
# this is needed in the first place.
# I want something like: vary n_sig randomly, keep mu and sigma fixed
for param in params:
param.randomize()
result = minimizer.minimize(nll)
if result.converged:
result.hesse()
fit_results.append(result)
# Now I want something like this:
# pull = (N_generated - N_fit ) / sigma_fit
# for each fit_result.