-2

I have the following code to make a plot using ggplot2 (here is the data file):

sig1 <- ggplot(var_dat_df %>%
                filter(!(variable %in% c("LogDiffSq", "cusum_ker", "de_ker", "hr_ker"))),
               aes(x = i, y = -log10(value), group = variable, color = variable)) +
          geom_line() +
          scale_color_manual(values = c("#1b9e77", "#d95f02", "#7570b3"),
                             labels = c("CUSUM", "DE", "HR"),
                             name = "Statistic") +
          geom_hline(yintercept = -log10(0.05), color = "red", linetype = "dashed") +
          scale_y_continuous(breaks = c(-log10(0.05), 5, 10, 15, 17),
                             labels = expression(alpha, 5, 10, 15, 17)) +
          xlab("Index") + ylab(expression(-log[10](p))) +
          labs(title = "Statistical Significance of Detected Change",
               subtitle = "Without Using Kernel Estimation for Long-Run Variance") +
          theme_bw() +
          theme(plot.title = element_text(size = rel(2)),
                legend.position = "bottom")

The following error message appears:

Warning message:
In eval(expr, envir, enclos) : NaNs produced

Here is the resulting figure:

plot

What are the green bars at the top? Why do they appear, and how can I get rid of them?

cgmil
  • 410
  • 2
  • 18
  • 3
    Can't say without the data, which is probably the reason for it (and sorry, but dropbox is not accessible to me). Btw., just some friendly advice: You might personally think that dplyr is god's gift to R programming, but some others don't. You increase the potential pool of answerers if you use base R for a simple subsetting operation. The subsetting is not relevant to your question anyway and should not be part of a *minimal* [reproducible example](http://stackoverflow.com/a/5963610/1412059). – Roland Dec 02 '16 at 07:23

1 Answers1

3

This is because your input values to log10 are zeros (or very small). You can try this:

value_for_log0 <- NA # define value_for_log0 as the value you want to have as output of log10 when it's nearly 0 

ggplot(var_dat_df %>%
         filter(!(variable %in% c("LogDiffSq", "cusum_ker", "de_ker", "hr_ker"))),
       aes(x = i, y = ifelse(round(value, 15)==0, value_for_log0,-log10(value)), group = variable, color = variable)) +
  geom_line() +
  scale_color_manual(values = c("#1b9e77", "#d95f02", "#7570b3"),
                     labels = c("CUSUM", "DE", "HR"),
                     name = "Statistic") +
  geom_hline(yintercept = -log10(0.05), color = "red", linetype = "dashed") +
  scale_y_continuous(breaks = c(-log10(0.05), 5, 10, 15, 17),
                     labels = expression(alpha, 5, 10, 15, 17)) +
  xlab("Index") + ylab(expression(-log[10](p))) +
  labs(title = "Statistical Significance of Detected Change",
       subtitle = "Without Using Kernel Estimation for Long-Run Variance") +
  theme_bw() +
  theme(plot.title = element_text(size = rel(2)),
        legend.position = "bottom")

enter image description here

Sandipan Dey
  • 21,482
  • 2
  • 51
  • 63
  • This pretends that the most significant results were never derived. That seems a problem to me. OP should calculate the p-values on the log-scale or create a graph that can depict these as `p < ...`. – Roland Dec 02 '16 at 09:05
  • @Roland, yes in that case we may want to have the log of p~0 (the most significant) values: *value_for_log0* defined to be slightly higher than the highest possible negative log-likelihood value, since here we are showing the y-axis from 0-15, may be we can define value_for_log0 as 16 or something, in order that it can appear as the most significant. That's why kept the constant *value_for_log0* configurable. – Sandipan Dey Dec 02 '16 at 11:13