1

I am sorry for the apparently basic question, but I tried hard, searched online as well, still stuck though. These are the data:

temp <- data.frame(mean=seq(1, 200, by=2), 
                 sd=seq(1, 200, by=2))


normv <- function( n , mean , sd ){
  out <- rnorm( n*length(mean) , mean = mean , sd = sd )
  return( matrix( out , , ncol = n , byrow = FALSE ) )
}


set.seed(1)
normv( 5 , temp$mean , temp$sd ) # 5 variables, from  V1 to V5
mydata <- as.data.frame(normv( 5 , temp$mean , temp$sd ))

And this is the loop to build 3 explorative graph on "mydata"

require(ggplot2)
require(car)
pdf(paste("Explore",1,".pdf",sep=""))
layout(matrix(c(1,2,3,3), 2, 2, byrow = FALSE))
lst1<- lapply(names(mydata),function(i) 
{
  print (
    ggplot(mydata, aes(i)) +
      geom_histogram(aes(y = ..density..),
                     fill = 'yellow',
                     alpha = 0.7,
                     col = 'black') +
      geom_density(colour="blue", lwd = 1, fill="lightyellow", alpha=0.5) +
      stat_function(fun = dnorm, 
                    args = list(mean = mean(mydata[,i], na.rm=T), sd = sd(mydata[,i], na.rm=T)), 
                    lwd = 1, 
                    col = 'red') +
      geom_vline(xintercept = mean(mydata[,i], na.rm=TRUE),col="lightblue", lty=1, lwd = 1) +
      geom_vline(xintercept = median(mydata[,i], na.rm=TRUE),col="purple", lty=2, lwd = 1) +
      theme_bw() +
      labs(title="Blue Line: Mean, Purple Line: Median") +
      theme(axis.text.x=element_text(size=14), axis.title.x=element_text(size=16),
            axis.text.y=element_text(size=14), axis.title.y=element_text(size=16))
        )


  qqnorm(mydata[,i], axes=FALSE)
  Boxplot(mydata[,i], 
          labels=rownames(mydata), id.n=Inf,
          col="royalblue",
          axes=TRUE,
          ylab=i,
          horizontal=FALSE)
})

A Mock-up of the final Image I would like to get is as follows: enter image description here Note that if I run the geom_histogram outside the loop it works fine, and when I exclude GGPLOT and use basic R for histogram, that works fine too. However, when I run the loop, I am still getting the error:

Error: StatBin requires a continuous x variable the x variable is discrete. Perhaps you want stat="count"? 

Note also that the difference between histogram and bar plot is crystal clear to me, in this case data are obviously continuous and NOT discrete, in facts I need an histogram. For academic purposes I have also tried to switch to geom_bar: I get rid of the error, but the resulting plot (as expected) does not make sense.

Any help is greatly appreciated

Diego
  • 127
  • 2
  • 13
  • 1
    It is looking for a column named `i` in your dataframe. If you are passing the column names as a string, use `aes_string` – Jack Brookes Mar 22 '18 at 22:45
  • Please add a mock up image of what you expect the output to be – Jack Brookes Mar 23 '18 at 11:20
  • Edited now and added the example as requested. Please note also that switching to ggplot(mydata, aes_string(i)) does not change the outcome (same error message about discrete var) and add few warnings about non-numerical arguments...sorry and thanks for your patience – Diego Mar 23 '18 at 12:48
  • See my edit below – Jack Brookes Mar 23 '18 at 13:26

1 Answers1

3

Lots going on here - I'm going to assume this is the type of graph you want, based on your code. Here, I gather the variables so that we can facet along the horribly named "variable" column. Usually, if something is hard, and you begin to use loops, there is usually a better way to do it.

library(tidyverse)
library(car)
temp <- data.frame(mean=seq(1, 200, by=2), 
                   sd=seq(1, 200, by=2))


normv <- function( n , mean , sd ){
  out <- rnorm( n*length(mean) , mean = mean , sd = sd )
  return( matrix( out , , ncol = n , byrow = FALSE ) )
}

set.seed(1)
normv( 5 , temp$mean , temp$sd ) # 5 variables, from  V1 to V5

mydata <- as.data.frame(normv( 5 , temp$mean , temp$sd )) %>% 
  gather("variable", "value", V1:V5)


ggplot(mydata, aes(value)) +
  geom_histogram(aes(y = ..density..),
                 fill = 'yellow',
                 alpha = 0.7,
                 col = 'black') +
  geom_density(colour="blue", lwd = 1, fill="lightyellow", alpha=0.5) +
  facet_grid(~variable) +
  geom_vline(aes(xintercept = summarised_value, color = stat), 
             size = 1,
             data = mydata %>% 
               group_by(variable) %>% 
               summarise(mean = mean(value), median = median(value)) %>% 
               gather("stat", "summarised_value", mean:median)) +
  theme_bw() +
  theme(axis.text.x=element_text(size=14), axis.title.x=element_text(size=16),
        axis.text.y=element_text(size=14), axis.title.y=element_text(size=16))

enter image description here


Edit

For the whole problem, this should be enough to get you started:

I still gather the variables, as this makes it easier to plot. I simply take a subset of the dataframe with only the values I care about before plotting.

library(tidyverse)
library(car)
library(cowplot)
temp <- data.frame(mean=seq(1, 200, by=2), 
                   sd=seq(1, 200, by=2))


normv <- function( n , mean , sd ){
  out <- rnorm( n*length(mean) , mean = mean , sd = sd )
  return( matrix( out , , ncol = n , byrow = FALSE ) )
}

set.seed(1)
normv( 5 , temp$mean , temp$sd ) # 5 variables, from  V1 to V5

mydata <- as.data.frame(normv( 5 , temp$mean , temp$sd )) %>% 
  gather("variable", "value", V1:V5)


make_plot <- function(variable_name){

  data_subset <- mydata %>% 
    filter(variable == variable_name)

  hist_g <- data_subset %>% 
    ggplot(., aes(value)) +
    geom_histogram(aes(y = ..density..),
                   binwidth = 50,
                   fill = 'yellow',
                   alpha = 0.7,
                   col = 'black') +
    geom_density(colour="#00000040", lwd = 1, fill="lightyellow", alpha=0.5) +
    geom_vline(aes(xintercept = summarised_value, color = stat), 
               size = 1,
               data = . %>% 
                 summarise(mean = mean(value), median = median(value)) %>% 
                 gather("stat", "summarised_value", mean:median)) +
    scale_color_manual(values = c("blue", "red")) +
    theme_bw() +
    theme(axis.text.x=element_text(size=14), axis.title.x=element_text(size=16),
          axis.text.y=element_text(size=14), axis.title.y=element_text(size=16),
          legend.position = c(.95, .95),
          legend.justification = c(1,1),
          legend.background = element_rect(color = "black"))

  qq <- ggplot(data_subset, aes(sample = value)) +
    stat_qq()

  bp <- ggplot(data_subset, aes(x = variable_name, y = value)) +
    geom_boxplot()

  # arrange three in a grid
  plot_grid(
    plot_grid(hist_g, qq, nrow = 2),
    bp,
    ncol = 2
    )
}

figures_list <- map(unique(mydata$variable), make_plot)
all_figures <- plot_grid(plotlist = figures_list, nrow = 1, ncol = 5)
save_plot("out.png", all_figures, ncol = 5, base_aspect_ratio = 0.9, base_height = 7)

enter image description here

I arranged these in a row with all_figures <- plot_grid(plotlist = figures_list, ..., but you could save them individually, mapping the list to ggsave, etc.

Jack Brookes
  • 3,720
  • 2
  • 11
  • 22
  • First of all, thank you so very much for the code and solution provided, I have a couple of questions though: - Do you know why exactly GGPLOT2 refuses to draw histograms with my code ? - Since I need 3 graph put together (as you can can see in my initial question) I would prefer to not use faceting here and add a Boxplot (through the CAR package) and a QQplot as well, is there an issue in combining Plots built through different packages ? Again, thanks a lot ! – Diego Mar 23 '18 at 08:31
  • See my comment. You're using that name of the column as a string, so you need `ggplot2(my data, aes_string(i)`. You can use something like `cowplot` to arrange your plots in a grid. – Jack Brookes Mar 23 '18 at 09:25
  • Jack, thank you so very much, terrific code : ) I still need to find a way to label outliers in the boxplot (as car::Boxplot does) but it is definitely extremely close to the final result I would like to get. Answer granted and upvoted for your time, availability and patience, really appreciated. – Diego Mar 23 '18 at 14:21
  • No problem - for outliers, lots of examples https://stackoverflow.com/questions/33524669/labeling-outliers-of-boxplots-in-r – Jack Brookes Mar 23 '18 at 14:29
  • Yup, Will check it out ! – Diego Mar 23 '18 at 15:12