0

I am trying to simulate the performance of a real life process. The variables that have been measured historically shows a fixed interval, so been lower o greater that those values is physically impossible.

To simulate the process output, each input variable historical data was represented as the best fit probability distribution, respectively (using this approach: Fitting empirical distribution to theoretical ones with Scipy (Python)?).

However, the resulting theoretical distribution when is simulated n-times do not represent the real life expected min and maximum values. I am thinking to apply a try-except test each simulation to check if each simulated value is between the expected interval, but I am not sure if this is the best way to handle this due to, experimental mean and variance is not achieved.

jmparejaz
  • 160
  • 1
  • 14
  • Please show actual code for what you're currently doing, I'm finding your verbal explanation to be unclear. – pjs Feb 24 '19 at 17:28

1 Answers1

1

You can use a boolean mask in numpy for regenerating the values that are outside the required boundaries. For example:

def random_with_bounds(func, size, bounds):
    x = func(size=size)
    r = (x < bounds[0]) | (x > bounds[1])
    while r.any():
        x[r] = func(size=r.sum())
        r[r] = (x[r] < bounds[0]) | (x[r] > bounds[1])
    return x

Then you can use it like:

random_with_bounds(np.random.normal, 1000, (-1, 1))

Another version using index arrays via np.argwhere gives slightly increased performance:

def random_with_bounds_2(func, size, bounds):
    x = func(size=size)
    r = np.argwhere((x < bounds[0]) | (x > bounds[1])).ravel()
    while r.size > 0:
        x[r] = func(size=r.size)
        r = r[(x[r] < bounds[0]) | (x[r] > bounds[1])]
    return x
a_guest
  • 34,165
  • 12
  • 64
  • 118