1

I have a data set that has two values per row I'd like to plot against each other.

For example:

RHC,1,0.370,0.287,0.003,0.063
SA,1,0.352,0.258,0.003,0.057
GA,1,0.121,0.091,0.430,0.008

I want to plot an individual line per column, grouped by the first column. E.g. for the RHC row, I'm plotting {x,y1} and {x,y2} of {1,0.370} and {1,0.287} respectively.

The following ggplot/geom_smooth accomplishes this:

ggplot(data=d) + 
  geom_smooth(aes(x=iterations, y=training.error, col=algorithm)) + 
  geom_smooth(aes(x=iterations, y=testing.error, col=algorithm))

However, both lines end up with a single legend entry and a single color...making them impossible to differentiate.

How can I apply a different color and respective legend entry for each line produced by each geom_smooth call?

To reproduce:

library(ggplot2)
d <- read.csv("https://gist.githubusercontent.com/jameskyle/8d233dcbd0ad0b66bfdd/raw/9c975ac9d9bbcb633e44cfd70b66f7ab89dc1517/results.csv")

p1 <- ggplot(data=d) +
    geom_smooth(aes(x=iterations, y=training.error, col=algorithm)) +
    geom_smooth(aes(x=iterations, y=testing.error, col=algorithm))

pdf("graph.pdf")
print(p1)
dev.off()

The above code will produce:

ggplot graph

Jaap
  • 81,064
  • 34
  • 182
  • 193
James Kyle
  • 464
  • 5
  • 12
  • Please read the info about how to give a [minimal reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610). – Jaap Oct 14 '15 at 05:40
  • Example code and data provided. – James Kyle Oct 14 '15 at 05:54

1 Answers1

4

Because you have several lines quite close to each other in one plot, it is probably better to use facets to get a clearer plot. Therefore the data should be reshaped into long format.

With the data.table package you can reshape into long format with multiple columns simultaneously:

library(data.table)

# melting operation for the error & time columns simultaneously
# and setting the appropriate labels for the variable column 
d1 <- melt(setDT(d),
           measure.vars = patterns('.error','.time'),
           value.name = c('error','time'))[, variable := c('train','test')[variable]]

Now you can make the facetted plot (I've added a fill as well for differentiating between the shaded areas):

ggplot(data=d1) +
  geom_smooth(aes(x=iterations, y=error, col=variable, fill=variable), size=1) +
  facet_grid(. ~ algorithm) +
  theme_bw()

this results in:

enter image description here

If you really want everything in one plot, you can add a linetype to the aes as well in order to better differentiate between the several lines:

ggplot(data=d1) +
  geom_smooth(aes(x=iterations, y=error, col=algorithm, linetype=variable), size=1) +
  theme_bw()

the result:

enter image description here

Jaap
  • 81,064
  • 34
  • 182
  • 193