I'm trying to generate a random data series (or a time series) for anomaly detection, with events spanning a few consecutive data points. They could be values above/below a certain threshold, or anomaly types with different known probabilities.
e.g. in a case where 1 is normal and event types are within [2, 3, 4]:
11112221113333111111112211111
I looked through the np.random
and random
methods, but couldn't find any that generate these events. My current solution is picking random points, adding random durations to them to generate event start and end positions, labeling each event with a random event type, and joining back to the dataset, something like:
import numpy as np
num_events = np.random.randint(1, 10)
number_series = [1]*60
first_pos = 0
event_starts = sorted([first_pos + i for i in np.random.randint(50, size = num_events)])
event_ends = [sum(i) for i in list(zip(event_starts, np.random.randint(8, size = num_events)))]
for c in list(zip(event_starts, event_ends)):
rand_event_type = np.random.choice(a = [2, 3, 4], p = [0.5, 0.3, 0.2])
number_series[c[0]:c[1]] = [rand_event_type]*len(number_series[c[0]:c[1]])
print(number_series)
[1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 1, 1, 1, 1, 3, 3, 3, 3, 3, 3, 3, 1, 1, 1, 1, 3, 3, 4, 4, 4, 4, 4, 4, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 4, 4, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
But I'm wondering if there is a simpler way to just generate a series of numbers with events, based on a set of probabilities.