0

Imagine a dataframe that looks like this:

1
2
3
4
5
6
7
50
16
17

Normally we would apply an algorithm from Detect and exclude outliers in a pandas DataFrame to entirely remove the 50, however my particular dataset instead requires me to distribute the values of the 50 over the previous 7 days:

8
9
10
11
12
13
14
15
16
17

How can I make this work in Pandas? I can detect the outliers pretty easily but not sure how to spread the values out into previous days. Note that a simple moving average doesn't work well for this type of data, as there would still be a jump in the average value when 50 shows up. What I need to do is smooth out 50 into the previous days so that no jump is visible.

Mark Rotteveel
  • 100,966
  • 191
  • 140
  • 197
JonathanReez
  • 1,559
  • 3
  • 21
  • 37
  • Why your input dataframe has a length of 11 and your output 10? Where is 1? Can you update your dataframe with a more complete input and output example, please? – Corralien Dec 19 '21 at 08:12
  • @Corralien you're right, updated – JonathanReez Dec 19 '21 at 08:17
  • How do you choose 15, 14, 13, 12, 11, 10, 9 and 8 (8 values not 7) – Corralien Dec 19 '21 at 08:28
  • @Corralien once I see 50, I want to add 50/7 to all previous days and set the current value to what it should've been if we assume that the values keep increasing/decreasing at the same rate. – JonathanReez Dec 19 '21 at 08:54
  • @Corralien basically imagine we have two data streams: one is measuring things on a daily basis and one dumps new datapoints at random periods of time as a big chunk. We want to acknowledge this new data but also avoid a jump in the graph. – JonathanReez Dec 19 '21 at 08:56

0 Answers0