0

I have four categories that I am plotting her using ggplot. I would like add a moving average using geom_ma but I have too few of the green dots to get a good moving average (I would prefer a period of at least 20). How can I keep the scatterplot as is and only add a MA of the purple and blue dots, which would be in my range of a 20 period moving average?

Example: ggplot(data, aes(x, y, color=Str)) + geom_point(stat="identity") + geom_ma(ma_fun = SMA, n = 20, linetype=1, size=1, na.rm=TRUE)

I get the error: "Warning message: Computation failed in stat_sma(): n = 20 is outside valid range: [1, 10]"

Natty
  • 1
  • 2
    It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. Explicitly list any non-base R packages that you are using (where does `geom_ma` come from)? – MrFlick Dec 01 '20 at 19:15

1 Answers1

0

This is a great example of why it helps to provide a minimal reproducible example. You have provided the code that produced the error, but there is nothing wrong with the code on its own: it will only cause this error with certain inputs. Given suitable data, your code is fine.

Let's make a dummy data frame with the same name and column names as your data frame. We will make data for the first 330 days of 2020, and we will have 4 groups in Str, so a total of 1320 rows:

library(tidyquant)
library(ggplot2)

set.seed(1)

data <- data.frame(x = rep(seq(as.Date("2020-01-01"), 
                           by = "day", length.out = 330), 4),
                   y = as.vector(replicate(4, 1000 * cumsum(rnorm(330)))),
                   Str = rep(c("A", "B", "C", "D"), each = 330))

Now if we use your exact plotting code, we can see that the plot is fine:

ggplot(data, aes(x, y, color = Str)) + 
  geom_point(stat="identity") + 
  geom_ma(ma_fun = SMA, n = 20, linetype = 1, size = 1, na.rm = TRUE)

But if one or more of our Str groups has fewer than 20 measurements, then we get your error. Let's remove most of the Str == "A" and Str == "B" cases, and repeat the plot:

data <- data[c(1:20 * 33, 661:1320),]

ggplot(data, aes(x, y, color = Str)) + 
  geom_point(stat="identity") + 
  geom_ma(ma_fun = SMA, n = 20, linetype = 1, size = 1, na.rm = TRUE)
#> Warning: Computation failed in `stat_sma()`:
#> n = 20 is outside valid range: [1, 10]

enter image description here

We get your exact warning, and the MA lines disappear from all the groups. Clearly we cannot get a 20-measurement moving average if we only have 10 data points, so geom_ma just gives up.

The fix here is to use the data = argument in geom_ma to filter out any groups with fewer than 20 data points:

ggplot(data, aes(x, y, color = Str)) + 
  geom_point(stat="identity") + 
  geom_ma(ma_fun = SMA, n = 20, linetype = 1, size = 1, na.rm = TRUE,
          data = data[data$Str %in% names(table(data$Str)[table(data$Str) > 20]),])

enter image description here

Allan Cameron
  • 147,086
  • 7
  • 49
  • 87