Assuming you have your data in a dataframe
with two columns: 'time'
and 'size'
, and that there are around 500 observations in total (so the window size 10 is sensible):
Calculate the median of a moving window.
If for some value (the median centered at it
* multiplier_thresh
) is >=
its 'size'
, then consider this value an outlier and remove it:
wind_size = 10
multiplier_thresh = 1.5
# Calculate rolling median
rolling_median = df['size'].rolling(window=wind_size).median().bfill()
# Drop outliers
to_stay = df['size'] < rolling_median * multiplier_thresh
df_no_outliers = df[to_stay]
Mean of the values without the outliers:
df_no_outliers['size'].mean()
A simpler approach:
Just remove the outliers of all your 'size'
values.
You can use a variety of methods to detect and remove the outliers.
Here is a simple one:
q1 = df["size"].quantile(0.25)
q3 = df["size"].quantile(0.75)
iqr = q3 - q1 # Interquartile range
df_no_outliers = df[df["size"] < q3 + 1.5 * iqr]