0

I am trying to loop a ggplot2 plot with a linear regression line over it. It works when I type the y column name manually, but the loop method I am trying does not work. It is definitely not a dataset issue.

I've tried many solutions from various websites on how to loop a ggplot and the one I've attempted is the simplest I could find that almost does the job.

The code that works is the following:

plots <- ggplot(Everything.any, mapping = aes(x = stock_VWRETD, y = stock_10065)) +
    geom_point() +
    labs(x = 'Market Returns', y = 'Stock Returns', title ='Stock vs Market Returns') +
    geom_smooth(method='lm',formula=y~x)

But I do not want to do this another 40 times (and then 5 times more for other reasons). The code that I've found on-line and have tried to modify it for my means is the following:

plotRegression <- function(z,na.rm=TRUE,...){
  nm <- colnames(z)
  for (i in seq_along(nm)){
    plots <- ggplot(z, mapping = aes(x = stock_VWRETD, y = nm[i])) +
    geom_point() +
    labs(x = 'Market Returns', y = 'Stock Returns', title ='Stock vs Market Returns') +
    geom_smooth(method='lm',formula=y~x)

    ggsave(plots,filename=paste("regression1",nm[i],".png",sep=" "))
  }
}

plotRegression(Everything.any)

I expect it to be the nice graph that I'd expect to get, a Stock returns vs Market returns graph, but instead on the y-axis, I get one value which is the name of the respective column, and the Market value plotted as normally, but as if on a straight number-line across the one y-axis value. Please let me know what I am doing wrong.

Desired Plot:

enter image description here

Actual Plot:

enter image description here

Sample Data is available on Google Drive here: https://drive.google.com/open?id=1Xa1RQQaDm0pGSf3Y-h5ZR0uTWE-NqHtt

divibisan
  • 11,659
  • 11
  • 40
  • 58
  • 1
    Having neither your data nor your output, all anyone can do is guess as to what's going on. [See here](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) on making an R example folks can easily help with. – camille Jun 05 '19 at 23:32
  • Thank you for responding. I have updated the post to include a picture of the results. I did not include the actual data because I believe it is absolutely not the problem. It is complete and has been used for many other processes, and has been used in a plot already. – Jonathan Fung Jun 05 '19 at 23:40
  • 1
    Including some sample data is still important, otherwise people have to spend time making up a dataset to test your code with. I think your issue is with `aes()`. Replacing that with `aes_string()` might help, but because you haven't included a reproducible example that I can easily copy/paste/run, I can't test that. – Stewart Macdonald Jun 05 '19 at 23:50
  • Hi Stewart. Thank you for responding. I apologize for not including a sample for you to test with. I updated the sample to the google drive folder. I was able to reproduce the incorrect graph with the sample, so you should hopefully be able to as well. The replacement for aes to aes_string unfortunately did not help. – Jonathan Fung Jun 06 '19 at 00:10

2 Answers2

1

The problem is that when you assign variables to aesthetics in aes, you mix bare names and strings. In this example, both X and Y are supposed to be variables in z:

aes(x = stock_VWRETD, y = nm[i])

You refer to stock_VWRETD using a bare name (as required with aes), however for y=, you provide the name as a character vector produced by colnames. See what happens when we replicate this with the iris dataset:

ggplot(iris, aes(Petal.Length, 'Sepal.Length')) + geom_point()

Incorrect plot with y-variable assigned to a single string

Since aes expects variable names to be given as bare names, it doesn't interpret 'Sepal.Length' as a variable in iris but as a separate vector (consisting of a single character value) which holds the y-values for each point.


What can you do? Here are 2 options that both give the proper plot

1) Use aes_string and change both variable names to character:

ggplot(iris, aes_string('Petal.Length', 'Sepal.Length')) + geom_point()

2) Use square bracket subsetting to manually extract the appropriate variable:

ggplot(iris, aes(Petal.Length, .data[['Sepal.Length']])) + geom_point()

Correct X-Y plot

divibisan
  • 11,659
  • 11
  • 40
  • 58
1

you need to use aes_string instead of aes, and double-quotes around your x variable, and then you can directly use your i variable. You can also simplify your for loop call. Here is an example using iris.

library(ggplot2)

plotRegression <- function(z,na.rm=TRUE,...){

  nm <- colnames(z)

  for (i in nm){
        plots <- ggplot(z, mapping = aes_string(x = "Sepal.Length", y = i)) +
          geom_point()+
          geom_smooth(method='lm',formula=y~x)
        ggsave(plots,filename=paste("regression1_",i,".png",sep=""))
  }
}
myiris<-iris
plotRegression(myiris)
M.Viking
  • 5,067
  • 4
  • 17
  • 33