3

This is my first foray into ggplot2 and I am experiencing difficulties. I'm trying to plot two series of random numbers against an incremented x-axis while showing linear regression for both. So far, I've succeeded in plotting the scatterplots, but the regression line keeps throwing errors. I know it's possible, but I'm missing something for executing the idea. I'm running RStudio Desktop version 1.3.1056, Water Lily with tidyverse loaded.

I know this works to display the scatterplot (I'm open to more elegant variants if suggested):

ggplot(a, aes(x = Datapoint, y = value, color = variable)) +  # Setup
  geom_point(aes(y = Series1, col = 'Series1')) +             # Series 1 plot
  geom_point(aes(y = Series2, col = 'Series2')) +             # Series 2 plot
  labs(title = 'example', xlab = 'Datapoint', ylab = 'Datapoint Value')   # Title and axes labels

I also know this works to display a linear regression line if only using one y-series:

ggplot(a) +
  aes(x = Datapoint, y = value, color = variable) +
  geom_point()

When I try adding geom_smooth() or geom_smooth(method = lm) to the main block, I end up getting a "Error in FUN(X[[i]], ...) : object 'value' not found" message. For example, this:

ggplot(a, aes(x = Datapoint, y = value, color = variable)) +  # Setup
  geom_point(aes(y = Series1, col = 'Series1')) +             # Series 1 plot
  geom_point(aes(y = Series2, col = 'Series2')) +             # Series 2 plot
  labs(title = 'example', xlab = 'Datapoint', ylab = 'Datapoint Value') +   # Title and axes labels
  geom_smooth(method = lm)

results in this:

>  ggplot(a, aes(x = Datapoint, y = value, color = variable)) +  # Setup
+   geom_point(aes(y = Series1, col = 'Series1')) +             # Series 1 plot
+   geom_point(aes(y = Series2, col = 'Series2')) +             # Series 2 plot
+   labs(title = 'example', xlab = 'Datapoint', ylab = 'Datapoint Value') +   # Title and axes labels
+   geom_smooth(method = lm)
Error in FUN(X[[i]], ...) : object 'value' not found

Some places I've looked for inspiration include the following:

I'm fairly certain this should be a simple issue, but one which I don't yet understand. What am I missing?

The data file I'm using is hosted here: https://github.com/davidmvermillion/Chart_Comparisons/blob/master/Seeded_Values_for_Comparison_Project.csv

This is my current R file: https://github.com/davidmvermillion/Chart_Comparisons/blob/master/ggplot2Demo.R

Thank you!

1 Answers1

3

Try this example. I believe is close to what you want. Please next time include data to reproduce your issue in a proper format using dput(). It looks like some variable is missing in your data or you are placing the wrong name. This example can be a good point to start (Also included some solutions using your real data from github):

library(tidyverse)
#Data
data("iris")
#Code for data and plot
iris %>% 
  ggplot(aes(x=Sepal.Length,y=Sepal.Width,group=Species,color=Species))+
  geom_point()+
  geom_smooth(method = 'lm',se=F)

Output:

enter image description here

Or if you want facets (a plot for each group), try next code:

#Code for data and plot 2
iris %>% 
  ggplot(aes(x=Sepal.Length,y=Sepal.Width,group=Species,color=Species))+
  geom_point()+
  geom_smooth(method = 'lm',se=F)+
  facet_wrap(.~Species)

Output:

enter image description here

And after exploring your data on github, maybe your are looking for this (hint: reshape data to long keeping DataPoint):

#Code for data and plot 3
df %>% pivot_longer(-Datapoint) %>% 
  ggplot(aes(x=Datapoint,y=value,color=name,group=name))+
  geom_point()+
  geom_smooth(method = 'lm',se=F)

Output:

enter image description here

Or a nicer solution with facets:

#Code for data and plot 4
df %>% pivot_longer(-Datapoint) %>% 
  ggplot(aes(x=Datapoint,y=value,color=name,group=name))+
  geom_point()+
  geom_smooth(method = 'lm')+
  facet_wrap(.~name)

Output:

enter image description here

Some data used:

#Data
df <- structure(list(Datapoint = 1:50, Series1 = c(37L, 7L, 26L, 27L, 
91L, 77L, 58L, 87L, 58L, 13L, 62L, 91L, 18L, 18L, 23L, 61L, 90L, 
26L, 2L, 54L, 27L, 30L, 52L, 39L, 3L, 37L, 32L, 43L, 28L, 6L, 
50L, 50L, 71L, 45L, 37L, 19L, 84L, 61L, 46L, 51L, 39L, 95L, 16L, 
27L, 28L, 89L, 54L, 98L, 98L, 61L), Series2 = c(25L, 88L, 65L, 
5L, 28L, 51L, 29L, 83L, 10L, 98L, 52L, 26L, 68L, 64L, 3L, 6L, 
39L, 53L, 96L, 15L, 40L, 24L, 65L, 27L, 84L, 13L, 83L, 43L, 14L, 
65L, 76L, 95L, 15L, 100L, 5L, 62L, 92L, 58L, 10L, 32L, 9L, 83L, 
41L, 99L, 46L, 32L, 19L, 1L, 13L, 39L)), class = "data.frame", row.names = c(NA, 
-50L))
Duck
  • 39,058
  • 13
  • 42
  • 84
  • 1
    Yay! Thank you @Duck! That did it using dplyr. Plot 3 (as I'm seeing in the comment) wound up achieving 95% of my objective (I think I can figure out the other part shortly). – David M Vermillion Oct 05 '20 at 18:36