1

In the following reproducible example I'm attempting to build a ggplot2 function call dynamically, in order to be able to accommodate unknown number of mixture distribution components. The code produces this error message: Error in parse(text = g) : <text>:8:0: unexpected end of input. What is the problem with the code? (I'm aware of the method of pre-calculating plot data, storing it in a data frame, melting it and supplying it to ggplot2. I would like to explore the option below, as well.) Thank you!

library(ggplot2)
library(scales)
library(RColorBrewer)
library(mixtools)

NUM_COMPONENTS <- 2

set.seed(12345) # for reproducibility

data(diamonds, package='ggplot2')  # use built-in data
myData <- diamonds$price

calc.component <- function(x, lambda, mu, sigma) {
  lambda * dnorm(x, mean = mu, sd = sigma)
}


overlayHistDensity <- function(data, func) {

  # extract 'k' components from mixed distribution 'data'
  mix <- normalmixEM(data, k = NUM_COMPONENTS,
                     maxit = 100, epsilon = 0.01)
  summary(mix)

  DISTRIB_COLORS <- 
    suppressWarnings(brewer.pal(NUM_COMPONENTS, "Set1"))

  # plot histogram, empirical and fitted densities
  g <- "ggplot(data) +\n"

  for (i in seq(length(mix$lambda))) {
    args <- paste0("args.", i)
    assign(args, list(lambda = mix$lambda[i], mu = mix$mu[i],
                 sigma = mix$sigma[i]))
    g <- paste0(g,
                "stat_function(fun = func, args = ",
                args,
                ", aes(color = ",
                DISTRIB_COLORS[i], ")) +\n")
  }

  tailStr <- 
    "geom_line(aes(y = ..density..,colour = 'Empirical'),stat = 'density') +
     geom_histogram(aes(y = ..density..), alpha = 0.4) +
     scale_colour_manual(name = '', values = c('red', 'blue')) +
     theme(legend.position = 'top', legend.direction = 'horizontal')"

  g <- paste0(g, tailStr)
  gr <- eval(parse(text = g))
  return (gr)
}

overlayHistDensity(log10(myData), 'calc.component')
Aleksandr Blekh
  • 2,462
  • 4
  • 32
  • 64

2 Answers2

3

As long as you realize you are going about this a hard way...

If you look at the value of g before it is parsed, it is

ggplot(data) +
stat_function(fun = func, args = args.1, aes(color = #E41A1C)) +
stat_function(fun = func, args = args.2, aes(color = #377EB8)) +
geom_line(aes(y = ..density..,colour = 'Empirical'),stat = 'density') +
     geom_histogram(aes(y = ..density..), alpha = 0.4) +
     scale_colour_manual(name = '', values = c('red', 'blue')) +
     theme(legend.position = 'top', legend.direction = 'horizontal')

Usually the unexpected end of input message is from unbalanced quotes or parentheses, but you've not (obviously) got that problem here. The problem is in the color specification. Literal hex colors should be specified as strings

ggplot(data) +
stat_function(fun = func, args = args.1, aes(color = "#E41A1C")) +
stat_function(fun = func, args = args.2, aes(color = "#377EB8")) +
geom_line(aes(y = ..density..,colour = 'Empirical'),stat = 'density') +
     geom_histogram(aes(y = ..density..), alpha = 0.4) +
     scale_colour_manual(name = '', values = c('red', 'blue')) +
     theme(legend.position = 'top', legend.direction = 'horizontal')

Without the quotes, the hash is a comment character and the rest of the lines (the right parentheses in particular) are not included, and the error you got is given. (Note the syntax highlighting that SO gives on the first code snippet.)

That said, I think you can get what you want without the eval(parse()) approach. In particular, look at aes_string which allows the specification of which variable is used as the aesthetic by the value of a string variable and adding a list of stats or geoms (which can be of un-pre-specified length created using lapply, for example). Also, you seem to be specifying literal colors and then mapping them to just red and blue; possibly you want scale_colour_identity? All this (last paragraph) is more code review and is not what you actually asked about.

Brian Diggs
  • 57,757
  • 13
  • 166
  • 188
3

You've got several problems:

  • ggplot's data argument must be a data.frame, not a vector
  • hex color names starting with # must be quoted, or they'll be interpreted as comments
  • you must to provide an aes(x = ) mapping
  • color definitions that are constant do not go in aes

This should work:

overlayHistDensity <- function(data, func) {
    # extract 'k' components from mixed distribution 'data'
    mix <- normalmixEM(data, k = NUM_COMPONENTS,
                       maxit = 100, epsilon = 0.01)
    summary(mix)

    DISTRIB_COLORS <- 
        suppressWarnings(brewer.pal(NUM_COMPONENTS, "Set1"))

    # plot histogram, empirical and fitted densities
    g <- "ggplot(as.data.frame(data), aes(x = data)) +\n"

    for (i in seq(length(mix$lambda))) {
        args <- paste0("args.", i)
        assign(args, list(lambda = mix$lambda[i], mu = mix$mu[i],
                          sigma = mix$sigma[i]))
        g <- paste0(g,
                    "stat_function(fun = func, args = ",
                    args,
                    ", color = '",
                    DISTRIB_COLORS[i], "') +\n")
    }

    tailStr <- 
        "geom_line(aes(y = ..density..,colour = 'Empirical'),stat = 'density') +
     geom_histogram(aes(y = ..density..), alpha = 0.4) +
     scale_colour_manual(name = '', values = c('red', 'blue')) +
     theme(legend.position = 'top', legend.direction = 'horizontal')"

    g <- paste0(g, tailStr)
    gr <- eval(parse(text = g))
    return (gr)
}

Like Brian, I'll finish with 2 comments:

  1. This is standard debugging and you shouldn't need an SO post for it. It's essentially several syntax errors and a couple little mistakes. I took your code outside of a function and ran it up through the final g <- paste0 line, and put the g output in a code window and looked for problems. Try to write code that works outside of a function first, then put it in a function.

  2. Seconding Brian's comment, a more natural approach is to not use eval(parse()) and all this pasting. Instead, use aes_string, melt your data so that you can use one stat_function call based on a a grouping variable.

Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
  • Thank you very much for your answer! The data argument is an accidental leftover due to transitioning from `qplot()` to `ggplot()`. In regard to debugging, I've done quite a lot of it with function and without, before posting this question. I didn't want to offend anyone. – Aleksandr Blekh Sep 17 '14 at 19:30
  • 2
    All very good points; I only noticed the one most directly related to the question. I approached finding the problem similarly, but put a `return(g)` after the last `g <- paste0(...` because sometimes, due to scoping (modified data sets overwriting previous ones), the results are not identical. – Brian Diggs Sep 17 '14 at 20:02
  • @AleksandrBlekh thanks, and I hope I didn't offend (definitely didn't mean to!), just a comment. – Gregor Thomas Sep 17 '14 at 20:14
  • Everything is fine, no worries! Thank you and @BrianDiggs, again! By the way, the reason I've tried this approach rather than the "standard" melting approach is the problems with log scale: http://stackoverflow.com/a/25641112/2872891. I'm wondering, if, in addition to my mistakes, it has something to do with this `stat_function()` issue: http://stackoverflow.com/a/9424028/2872891. Has this issue been fixed? Do you have any advice on handling such situations (log-normal data visualization by `ggplot2`)? – Aleksandr Blekh Sep 17 '14 at 21:13
  • 2
    I was unaware of that `stat_function` issue. I tend to avoid `stat_function` entirely and do that sort of processing before plotting, using a second data.frame if necessary. I'm not sure how well that would work in your case. – Gregor Thomas Sep 17 '14 at 21:22
  • Reworked my solution, getting rid of that ugly `eval(parse())` stuff. Still not using the melting approach (need some time to adjust), but my current one (using `lapply()`) looks pretty compact and logical. Just could figure out legend panel. If you and/or @BrianDiggs are curious enough to take a look, here it is in all glory (it's the answer to my different question): http://stackoverflow.com/a/25641112/2872891. P.S. Basically, it seems that using log scales in `ggplot2` can be tricky, so I just log transform my data upfront for now - not elegant, but it works. – Aleksandr Blekh Sep 18 '14 at 03:01